A method for locating a smooth round in a sports video based on group motion information
By using the pre-trained multi-object tracking network model TransTrack and the watershed algorithm, combined with motion change indicators and threshold settings, the accuracy and efficiency issues of smooth round localization in sports videos were solved, achieving efficient segmentation of multi-person sports videos.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2022-08-30
- Publication Date
- 2026-06-23
AI Technical Summary
Existing temporal motion localization methods struggle to accurately distinguish between smooth and non-smooth laps in sports videos, and require extensive manual annotation and time-consuming training, making them unsuitable for scenarios involving multiple people performing continuous actions simultaneously.
The pre-trained multi-target tracking network model TransTrack is used to represent the intensity of the athlete's movement through a matrix. The watershed algorithm is combined to locate smooth rounds, and the accurate location of smooth rounds is achieved by using motion change indicators and threshold settings.
It improves the localization accuracy and computational efficiency of fluent rounds, reduces computational resource overhead, and enables easily distinguishable feature representations for fluent and non-fluent rounds.
Smart Images

Figure CN115393769B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of video processing technology in computer vision, and more particularly to a method for smooth round localization in sports videos based on group motion information. Background Technology
[0002] In net sports, such as volleyball, tennis, and badminton, the continuous shifts in momentum during the ongoing offensive and defensive interactions between the opposing teams are called a "fluid round." Segmenting fluid rounds from sports videos not only filters out irrelevant portions and shortens the video length, but also allows the segmented video to efficiently support subsequent tasks, such as highlight replays, team tactical analysis, or statistical analysis. The method for segmenting fluid rounds is the temporal action localization method.
[0003] However, existing temporal action localization methods are typically deep learning-based, which can accurately locate the intervals in which a single person's actions occur in a video. However, these methods require significant manpower for data annotation during the data acquisition phase, and the model training phase is also time-consuming. Furthermore, the fluid laps in sports videos, which this invention focuses on, often involve multiple people performing continuously changing actions simultaneously, a scenario where temporal action localization methods are often unsuitable. In a video, fluid and non-fluid laps cannot be easily distinguished as distinct features, and inappropriate localization methods for fluid laps directly impact the localization accuracy. Summary of the Invention
[0004] In view of this, embodiments of the present invention provide a method for locating smooth rounds in sports videos based on group motion information, in order to solve the technical problem in the prior art that smooth rounds and non-smooth rounds cannot be represented as easily distinguishable features in a video, and that the method for locating smooth rounds is inappropriate.
[0005] This invention provides a method for smooth round localization in sports videos based on group motion information, comprising:
[0006] S1 obtains the motion change index of the current video and uses the motion change index as group motion information to measure the intensity of the athletes' movements in the sports video.
[0007] S1 includes:
[0008] S11 is based on the pre-trained multi-object tracking network model TransTrack, and obtains tracking results in the test set of the Rally database;
[0009] S12 represents the tracking results of the athlete in each sample video as a matrix based on the tracking results;
[0010] S13 Based on the matrix, obtain the motion intensity of a single target in a certain frame of the video;
[0011] S14 Based on the motion intensity of a single target in a certain frame of the video, and by combining the motion intensity of all targets in each frame, the motion change index of the current video is obtained;
[0012] S2 locates smooth segments in the video based on the motion change indicators to achieve the purpose of segmenting sports videos.
[0013] Further, S11 includes:
[0014] The TransTrack multi-object tracking network model is pre-trained in a multi-object tracking database.
[0015] The training set of the Rally database is used to label ground truth values for multi-object tracking, and these ground truth values are used to fine-tune the multi-object tracking network model so that the multi-object tracking network model can track all objects in the sports video and obtain tracking results in the test set of the Rally database.
[0016] Further, S13 includes:
[0017] set up For the data in the nth row of the matrix, with frame number c, if Existence and and If the data belongs to the same target, normalize the data at interval j frames to interval 1 frame, and then take the average of the total k interval frames to obtain the result. The corresponding intensity of motion of the individual target, wherein, This represents the data in the (n+i)th row of the matrix, with frame number c+j.
[0018] Furthermore, the expression for the intensity of motion of the individual target is as follows:
[0019]
[0020] in, The intensity of motion of a single target For the data in the nth row of the matrix, This represents the data in the (n+i)th row of the matrix, with frame number c+i. |·| represents taking the absolute value, and ∑ represents summation.
[0021] Furthermore, the expression for the motion change index of the current video in S14 is as follows:
[0022]
[0023]
[0024] Among them, MVI c represents the motion change index of frame c, ∪ represents merging all items, Sort(·) represents sorting the elements, and MVI represents the motion change index of the current video.
[0025] Further, S2 includes:
[0026] Based on the watershed algorithm, water level thresholds and the shortest smooth round duration are set to obtain smooth round localization results.
[0027] Based on the smooth round positioning results, segmented sports videos are obtained.
[0028] Furthermore, the setting expressions for the water level line and the interval to be located are as follows:
[0029] WL = max(MVI) * Threshold
[0030] Segment>duration
[0031] Where WL represents the water level, max(·) represents the maximum value, * represents multiplication, Segment represents the interval to be located, and duration represents the shortest smooth round duration.
[0032] Furthermore, the shortest smooth round duration is measured using the interval intersection-union ratio accuracy in the temporal action localization task as a metric.
[0033] Furthermore, the expression for the intersection-union ratio is as follows:
[0034]
[0035] Where ∩ represents the intersection, ∪ represents the union, and Rally gt Represents the truth value of the smooth turn interval.
[0036] Furthermore, in step S11, the multi-target tracking network model TransTrack is fine-tuned using deep learning. The parameters in the TransTrack model are fine-tuned through training. The parameters specifically include the encoder, decoder, and multilayer perceptron (MLP).
[0037] The beneficial effects of the embodiments of the present invention compared with the prior art are as follows:
[0038] 1. This invention obtains more accurate athlete tracking results by pre-training the multi-target tracking network model used in existing tracking methods, thereby improving the model's reusability and reducing computational resource consumption;
[0039] 2. This invention represents the athlete tracking results in the video in matrix form, thereby obtaining features that make smooth rounds and non-smooth rounds easily distinguishable;
[0040] 3. This invention proposes the intensity of motion of a single target in a frame, which is more intuitive and conducive to smooth round-robin positioning;
[0041] 4. This invention directly performs smooth round-robin localization based on the intensity of motion of all targets in each merged frame, thus improving computational efficiency. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 This is a flowchart of a smooth round positioning method in sports videos based on group motion information provided by the present invention;
[0044] Figure 2 This is a schematic diagram showing the positioning range provided by the present invention included in a smooth round;
[0045] Figure 3 This is a schematic diagram of the MVI visualization of the current video provided by the present invention;
[0046] Figure 4 This is a schematic diagram of the smooth turn-based positioning provided by the present invention;
[0047] Figure 5 This invention provides a visual illustration of the segmentation effect. Detailed Implementation
[0048] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of the invention. However, those skilled in the art will understand that the invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of the invention with unnecessary detail.
[0049] The following will describe in detail, with reference to the accompanying drawings, a method for smooth round positioning in sports videos based on group motion information according to an embodiment of the present invention.
[0050] Figure 1 This is a flowchart of a smooth round localization method in sports videos based on group motion information provided by the present invention. Figure 1 As shown, the smooth round positioning method in this sports video includes:
[0051] S1 obtains the motion change index of the current video and uses the motion change index as group motion information to measure the intensity of the athletes' movements in the sports video.
[0052] Figure 2 This is a schematic diagram showing the positioning range provided by the present invention included in a smooth round;
[0053] S1 includes:
[0054] S11, based on the pre-trained multi-target tracking network model TransTrack, obtains tracking results in the test set of the Rally database;
[0055] TransTrack is a general multi-target tracking method, used in this method to obtain the athlete's position information at each moment. Other multi-target tracking methods can also be used.
[0056] The TransTrack multi-object tracking network model is pre-trained in a multi-object tracking database.
[0057] The training set of the Rally database is used to label ground truth values for multi-object tracking, and these ground truth values are used to fine-tune the multi-object tracking network model so that the multi-object tracking network model can track all objects in sports videos. The tracking results are obtained on the test set of the Rally database.
[0058] In some embodiments, the multi-target tracking network model TransTrack is fine-tuned, specifically using deep learning. This involves training to fine-tune a large number of learnable parameters in the TransTrack model used by existing tracking methods. These parameters include the encoder, decoder, and multilayer perceptron (MLP). For example, the fine-tuning of the MLP is as follows: assuming the relationship between the output y and the input x is y = W... MLP (x), W MLP As learnable parameters, the deviation between the output y and the true value is passed to W using the backpropagation mechanism. MLP This makes W MLPIt is continuously trained until the output y eventually approximates the true value, and the fine-tuning of other parameters follows the same method.
[0059] In step S11, the multi-target tracking network model TransTrack is fine-tuned using deep learning. The parameters in the TransTrack model are fine-tuned through training. The parameters specifically include the encoder, decoder, and multilayer perceptron (MLP).
[0060] S12, based on the tracking results, represents the athlete tracking results of each sample video as a matrix;
[0061] S13 uses a matrix to obtain the intensity of motion of a single target in a certain frame of a video.
[0062] S13 includes:
[0063] set up For the data in the nth row of the matrix, with frame number c, if Existence and and If the data belongs to the same target, normalize the data at interval j frames to interval 1 frame, and then take the average of the total k interval frames to obtain the result. The corresponding intensity of motion of a single target, among which, This represents the data in the (n+i)th row of the matrix, with frame number c+j.
[0064] The expression for the intensity of motion of a single target is as follows:
[0065]
[0066] in, The intensity of motion of a single target For the data in the nth row of the matrix, This represents the data in the (n+i)th row of the matrix, with frame number c+i. |·| represents taking the absolute value, and ∑ represents summation.
[0067] This invention represents the athlete tracking results in a video in matrix form, thereby obtaining features that make smooth rounds and non-smooth rounds easily distinguishable.
[0068] Figure 2 This is a schematic diagram showing the positioning range provided by the present invention included in a smooth round.
[0069] Figure 3 This is a schematic diagram of the MVI visualization of the current video provided by the present invention.
[0070] S14 obtains the motion change index of the current video based on the motion intensity of a single target in a certain frame and by merging the motion intensity of all targets in each frame.
[0071] The expression for the motion change index of the current video in S14 is as follows:
[0072]
[0073]
[0074] Among them, MVI c represents the motion change index of frame c, ∪ represents merging all items, Sort(·) represents sorting the elements, and MVI represents the motion change index of the current video.
[0075] Figure 4 This is a schematic diagram of the smooth turn-based positioning provided by the present invention;
[0076] Figure 5 This invention provides a visualized diagram of the segmentation effect;
[0077] S2 identifies smooth segments in the video based on motion change indicators, thus achieving the goal of segmenting sports videos.
[0078] Based on the watershed algorithm, different thresholds are set to simulate immersion in the algorithm;
[0079] The desired positioning result is obtained based on the set water level threshold and the shortest smooth round duration.
[0080] The expressions for setting the water level line and the area to be located are as follows:
[0081] WL = max(MVI) * Threshold
[0082] Segment>duration
[0083] Where WL represents the water level, max(·) represents the maximum value, * represents multiplication, Segment represents the interval to be located, and duration represents the shortest smooth round duration.
[0084] The shortest smooth round duration is measured using the interval intersection-union ratio accuracy in the temporal action localization task.
[0085] The expression for intersection-union ratio is as follows:
[0086]
[0087] Where ∩ represents the intersection, ∪ represents the union, and Rally gt Represents the truth value of the smooth turn interval.
[0088] The smooth round-based positioning method used in this invention achieves ideal positioning results for segmented sports videos.
[0089] Example 1
[0090] The labeled videos consisted of two segments of training video data from the rally set, and 372 video segments were used for testing. The interval frame count k was set to 4, the threshold was set to [0.1, 0.15, 0.2, 0.25, 0.3], and the shortest smooth round duration was set to 90 frames. The Intersection over Union (IoU) accuracy in the temporal action localization task was used as the metric. The IoU formula is as follows:
[0091]
[0092] In the formula: ∩ represents the intersection, ∪ represents the union, and Rally gt Represents the truth value of the smooth turn interval.
[0093] The positioning results are shown in Table 1.
[0094]
[0095] In the table, mAP (Mean Average Precision) represents the average accuracy from 0.5 to 0.95 in the current IoU (Intersection over Union) time-series action localization task, with intervals of 0.05.
[0096] Figure 5 The segmentation results were visualized. Figure 5 The first line represents a video with a duration of 14.7 seconds; the second line is the truth range of the smooth rounds; the third line is the smooth rounds located according to the present invention.
[0097] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A method for smooth round localization in sports videos based on group motion information, characterized in that, include: S1 obtains the motion change index of the current video and uses the motion change index as group motion information to measure the intensity of the athletes' movements in the sports video. S11 is based on the pre-trained multi-object tracking network model TransTrack, and obtains tracking results in the test set of the Rally database; S12 represents the tracking results of the athlete in each sample video as a matrix based on the tracking results; S13 Based on the matrix, obtain the motion intensity of a single target in a certain frame of the video; S14 Based on the motion intensity of a single target in a certain frame of the video, and by combining the motion intensity of all targets in each frame, the motion change index of the current video is obtained; S2 locates the smooth segments in the video based on the motion change indicators in order to segment the sports video. S11 includes: The TransTrack multi-object tracking network model is pre-trained in a multi-object tracking database. The training set of the Rally database is used to label ground truth values for multi-object tracking, and these ground truth values are used to fine-tune the multi-object tracking network model so that the multi-object tracking network model can track all objects in the sports video and obtain tracking results in the test set of the Rally database. S13 includes: set up For the matrix of the first Line data, frame number is ,like Existence and and If they belong to the same target, then the interval will be... The frame data is normalized to a 1-frame interval, and then the total is taken. The average value of each interval frame can be obtained The corresponding intensity of motion of the individual target, wherein, Represents the matrix of the first +i rows of data, frame number is +j; The expression for the intensity of motion of a single target is as follows: in, The intensity of motion of a single target For the matrix of the first row data, This is the data in the (n+i)th row of the matrix, with frame number c+i. This indicates taking the absolute value. This indicates a summation.
2. The smooth round positioning method in sports videos according to claim 1, characterized in that, The S2 includes: Based on the watershed algorithm, water level thresholds and the shortest smooth round duration are set to obtain smooth round localization results. Based on the smooth round positioning results, segmented sports videos are obtained.
3. The smooth round positioning method in sports videos according to claim 2, characterized in that, The expressions for setting the water level line and the interval to be located are as follows: Where WL represents the water level line, Indicates the maximum value. To represent multiplication, Indicates the range to be located. The shortest smooth round duration is indicated by MVI, which represents the motion change index of the current video.
4. The smooth round positioning method in sports videos according to claim 3, characterized in that, The shortest smooth turn duration is measured using the interval intersection-union ratio accuracy in the temporal action localization task.
5. The smooth round positioning method in sports videos according to claim 4, characterized in that, The crossover-union ratio expression is as follows: in, This indicates taking the intersection. To represent taking the union of sets, Represents the truth value of the smooth turn interval.
6. The method for smooth round positioning in sports videos according to claim 1, characterized in that, In step S11, the multi-target tracking network model TransTrack is fine-tuned using deep learning. The parameters in the TransTrack model are fine-tuned through training. The parameters specifically include the encoder, decoder, and multilayer perceptron (MLP).