Method and system for automatically tracking and generating trajectory of moving target in monitoring video

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining active detection, perception, prediction, and multi-objective interactive game model with the attention mechanism of self-supervised learning, the problem of continuous tracking and trajectory generation of moving targets in complex monitoring scenarios is solved, achieving efficient and accurate trajectory generation in low-light, multi-obstacle, and multi-objective interactive environments.

CN121884282BActive Publication Date: 2026-06-26SHANGHAI UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI UNIV
Filing Date: 2026-01-20
Publication Date: 2026-06-26

Application Information

Patent Timeline

20 Jan 2026

Application

26 Jun 2026

Publication

CN121884282B

IPC: G06V20/52; G06V20/40; G06N5/04; G06N3/0895; G06N3/045; G06V10/25; G06V10/82; G06N3/006; G06N3/0455; G06T7/20

CPC: G06V20/52; G06V20/40; G06N5/04; G06N3/0895; G06N3/045; G06V10/25; G06V10/82; G06N3/006

AI Tagging

Technology Topics

Video monitoring Computer graphics (images)

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies struggle to achieve continuous automatic tracking and accurate trajectory generation of moving targets in complex monitoring scenarios (such as low-light, multi-obstacle, and multi-target interactive environments). In particular, trajectory interruptions and identity confusion are prone to occur when there is prolonged occlusion or target interaction.

Method used

An active detection, perception, and prediction mechanism is used to simulate the diffusion process of target position uncertainty. Combined with a multi-objective interactive game model and a self-supervised learning attention mechanism, the target motion and environmental constraints are dynamically modeled to achieve the continuity and accuracy of the trajectory.

Benefits of technology

It achieves continuous output of moving targets in complex occlusion environments, reduces trajectory interruption rate and false detection rate, improves identity preservation stability and trajectory physical rationality, and enhances the robustness and accuracy of multi-target tracking.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121884282B_ABST

Patent Text Reader

Abstract

The application discloses a kind of monitoring video in motion target automatic tracking and trajectory generation method and system, belong to computer vision and video monitoring technical field.This method includes: obtaining monitoring video frame and detecting motion target;When there is target in current frame missing, based on its historical state construct probability distribution field diffusing with time, determine prediction candidate region in combination with scene occlusion constraint and actively enhance detection;The target and predicted candidate target detected in current frame are regarded as multi-agent, and identity correlation and state updating are carried out by introducing interactive game model;The trajectory attention weight obtained by self-supervised learning is used to guide and smooth the target trajectory, and the continuous trajectory data with identity identification is output.The application can maintain target identity stability and trajectory continuity under complex occlusion, dense interaction and weak detection scene, significantly reduce trajectory interruption and mismatch rate, and improve the reliability of motion target automatic tracking and trajectory generation in monitoring video.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computer vision and video surveillance technology, specifically to a method and system for automatic tracking and trajectory generation of moving targets in surveillance videos. Background Technology

[0002] Automatic tracking and trajectory generation of moving targets in surveillance videos is a key technical challenge in applications such as intelligent security and autonomous driving. Traditional multi-target tracking typically employs a "detection-tracking" paradigm, which involves detecting targets in each frame of the video and then correlating the detection results across frames to form a trajectory. However, due to factors such as similar target appearances, sudden changes in motion, frequent target occlusion, and multi-target interactions in real-world scenarios, existing technologies still face challenges in maintaining target identity continuity and trajectory accuracy. The following briefly describes several relevant existing technologies and their limitations:

[0003] TrackFormer: This is an end-to-end multi-target tracking method based on the Transformer attention mechanism, proposed in recent years. TrackFormer uses a track-by-attention paradigm to perform inter-frame association on global features using a Transformer decoder without requiring explicit motion or appearance models. Its innovation simplifies the data association process and achieved state-of-the-art performance at the time. However, due to the lack of explicit physical motion modeling, this method may struggle to reliably predict target positions under prolonged occlusion or violent motion conditions, and attention alone may not be effective in inferring motion trajectories during periods of invisibility.

[0004] Graph Neural Network (GNN) trackers: Some methods model multi-object tracking as a graph optimization problem, where object detection is treated as nodes and association hypotheses as edges, utilizing graph theory or graph neural networks to solve for global data associations. For example, some studies simultaneously construct appearance graphs and motion graphs to capture appearance similarity and motion relationships between objects, respectively, and model interactions between objects through global variables. Dynamically updated graph networks can iteratively optimize node and edge information, thereby improving tracking performance. However, graph theory methods are generally computationally complex and require carefully designed update mechanisms; otherwise, direct application of graph networks may fail. Furthermore, existing GNN tracking mainly focuses on learned association weights and does not explicitly simulate multi-agent interaction decisions from a game theory perspective, making it difficult to handle identity crossover errors caused by target strategic behavior.

[0005] Social Force Model (SFMM): This is a classic method commonly used in pedestrian motion modeling. It treats pedestrians as force-driven particles, including the driving force towards the destination and the repulsive forces between pedestrians and between pedestrians and obstacles. SFMM has been used to predict pedestrian motion under occlusion conditions to improve the continuity and robustness of tracking. Research shows that motion prediction using SFMM can improve tracking robustness and occlusion handling to some extent. However, SFMM relies on pre-set parameters and rules, making it difficult to handle complex and varied real-world behaviors (such as sudden turns and unconventional paths). Furthermore, pure physics models lack visual re-identification capabilities and cannot guarantee identity preservation when targets have highly similar appearances or poor lighting. Additionally, when there are many targets in a scene, calculations based on continuous force fields can lead to high computational costs, limiting its application in real-time monitoring.

[0006] Ant Colony Optimization (ACO): Ant colony optimization is a biomimetic stochastic optimization algorithm. Some literature uses it to solve multi-dimensional data association problems in multi-target tracking, specifically by simulating ant foraging paths to find the globally optimal trajectory matching combination. Compared to the traditional Hungarian algorithm, ACO offers a more comprehensive approach by exploring a larger solution space to achieve better associations. Some improved ant colony algorithms are claimed to achieve better association results within a reasonable timeframe and are suitable for multi-camera target tracking scenarios. However, ant colony algorithms are inherently heuristic iterative search algorithms, and parameter selection and convergence performance significantly impact their effectiveness. They focus on global data association optimization and do not directly address trajectory prediction and identity preservation under prolonged occlusion, requiring integration with other prediction or recognition methods.

[0007] In summary, while existing technologies each have their strengths, they may still encounter problems such as tracking interruptions, target identity confusion, and trajectory discontinuities in harsh environments (e.g., poorly lit underground parking lots, scenarios with multiple obstructions and multi-target interactions). In particular, when a target disappears from the field of view for an extended period or interacts closely with other targets, traditional methods either passively wait for the target to reappear, leading to trajectory interruptions, or mismatched identities result in incorrect trajectory connections. This reveals the shortcomings of current technologies in proactively predicting occlusion recovery and intelligently handling multi-target interactions. Therefore, there is an urgent need for a new method that innovates models based on existing technologies to significantly improve the robustness and accuracy of moving target tracking and trajectory prediction in complex scenarios. Summary of the Invention

[0008] Technical Objective: To address the shortcomings of existing technologies, this invention discloses a method and system for automatic tracking and trajectory generation of moving targets in surveillance videos. It improves existing multi-target tracking technologies and enables continuous automatic tracking of moving targets in complex surveillance scenarios (including low-light, multi-obstacle, and multi-target interactive environments) and generates accurate and smooth motion trajectories.

[0009] Technical solution: To achieve the above technical objectives, the present invention adopts the following technical solution:

[0010] A method for automatic tracking and trajectory generation of moving targets in surveillance videos, specifically including the following steps:

[0011] Step 1: Obtain the current frame image of the surveillance video sequence, detect moving targets appearing in the frame, and obtain the initial state information of each target; for subsequent frames of the sequence, inherit the state prediction of the targets already existing in the previous frame for association matching;

[0012] Step 2: For the missing target detected in the current frame, perform trajectory prediction based on the motion state of the target before it disappears, simulate the diffusion process of the uncertainty of the target position to estimate its possible occurrence area in the current frame, and actively perform target detection in the possible occurrence area to recover the trajectory of the occluded target;

[0013] Step 3: Treat all targets detected in the current frame and targets recovered through prediction as multiple agents, establish an interactive game model that includes all targets, calculate the motion decision strategy of each target based on the relative position and motion trend between each target, and perform target identity association and trajectory matching based on the game decision results of each target, and update the identity label and state parameters of each target.

[0014] Step 4: Smooth and extend the updated target trajectories. In the trajectory generation process, introduce a self-supervised learning attention mechanism to adaptively allocate path attention weights based on the target's historical trajectory and scene constraints, guide trajectory points to converge towards high-weight regions, and generate smooth and continuous trajectories that conform to the target's true movement trend.

[0015] Step 5: Output and store the identities and trajectory coordinates of each target confirmed in the current frame, process subsequent frames in a loop, and continuously apply the automatic continuous tracking and complete trajectory generation of moving targets from Steps 2-4 throughout the entire video sequence.

[0016] Preferably, step 2 further includes: when the target existed in the previous frame but was not detected in the current frame, the last location of the target is taken as the initial location, a probability distribution field is constructed around it and diffused over time to simulate the possible location propagation of the target; the diffusion radius of the probability distribution field is increased according to the target's maximum speed prediction, and the probability distribution field is constrained or deflected in combination with environmental obstacle information to predict multiple locations that the target may appear after bypassing the obstruction; the system lowers the detection threshold or uses other sensing methods to actively detect the target near the predicted location, and once detected, it is associated with the aforementioned missing target and its trajectory is restored.

[0017] Preferably, the multiple target interactive game associations in step 3 include:

[0018] Establish a cost function for each target, which simultaneously considers the degree of deviation of the target from its expected motion as well as the penalty for colliding with or getting too close to other targets;

[0019] The motion strategy of each target in the current frame can be obtained by solving the Nash equilibrium of the multi-agent game or by using an iterative optimization algorithm.

[0020] Predict the position of each target in the next moment based on the motion strategy, and use the predicted position to assist in target identity matching in the current frame;

[0021] When multiple targets interact at close range, the association is determined first based on the yield / avoidance relationship obtained from the game theory model, so as to maintain the consistency of the identities of each target.

[0022] Preferably, step 4 specifically includes: providing a trajectory attention model that has been pre-trained in an unsupervised or self-supervised manner; analyzing the historical trajectory sequence of each target to generate the attention weight distribution of future trajectory points; adjusting the original predicted trajectory according to the attention weights, increasing the weight or offset of trajectory points in high-weight directions, and weakening the trajectory offset in low-weight directions, so that the final output trajectory satisfies the smoothness while conforming to scene constraints and target motion laws; wherein, the training of the trajectory attention model utilizes segments of unlabeled video data where the target disappears and reappears or is partially occluded, and iteratively adjusting the trajectory attention model parameters by having the trajectory attention model predict the motion position during the occlusion period and compare it with the actual appearance position.

[0023] An automatic tracking and trajectory generation system for moving targets in surveillance videos is provided to implement the automatic tracking and trajectory generation method for moving targets in surveillance videos as described above. The system includes a video acquisition and preprocessing unit, an active trajectory prediction unit, a multi-objective game association unit, a self-supervised trajectory guidance unit, and a result output unit; wherein:

[0024] The video acquisition and preprocessing unit is used to acquire video frame sequences from the surveillance camera and perform target detection on each frame to obtain initial detection information for multiple targets.

[0025] The active trajectory prediction unit is used to calculate the predicted location region of the missing target in subsequent frames based on the state of the most recent appearance of the missing target when the target is detected to be missing. The active trajectory prediction unit is configured to use a wave diffusion model to simulate the propagation of the uncertainty of the target position, and control the camera or detection algorithm to actively perceive the predicted region and recapture the temporarily lost target.

[0026] The multi-objective game association unit is used to receive detection information provided by the video acquisition and preprocessing unit and predicted position provided by the active trajectory prediction unit, and to perform identity association and state update for all targets in the current frame. The multi-objective game association unit integrates a game model, which determines the motion trend of each target by solving the optimal decision of multiple agents, and performs target-trajectory matching and identity preservation accordingly. When targets occlude or interact with each other, the correct trajectory correspondence is maintained first based on the game model result.

[0027] The self-supervised trajectory guidance unit is used to smooth and optimize the trajectories of each target after target association is completed. The self-supervised trajectory guidance unit includes an attention calculation module, which uses the target's historical trajectory and scene information to generate attention weights for future trajectory points and adjusts the position of the trajectory points to fit the high-weight area, thereby achieving dynamic guidance of the trajectory. This unit does not require manual calibration and continuously updates the attention model through self-supervised learning to adapt to environmental changes and improve the accuracy of trajectory prediction.

[0028] The result output unit is used to output and record the identity of each tracked target and its continuous motion trajectory data. The continuous motion trajectory data is jointly determined by the multi-objective game association unit and the self-supervised trajectory guidance unit.

[0029] Preferably, the active trajectory prediction unit is configured to: acquire the position and motion vector of the target at the moment the target disappears, calculate the possible diffusion range of the target position using a preset maximum velocity parameter, and combine the pre-stored scene obstacle layout information to crop or guide the diffusion range to obtain several candidate regions; the active trajectory prediction unit further includes a control module for controlling the video acquisition and preprocessing unit to perform key detection on the candidate regions or adjust the sensitivity of the detection algorithm.

[0030] Preferably, the multi-objective game association unit includes: a cost function calculation module and a strategy solving module; the cost function calculation module is used to construct a cost function model for each target based on the distance between targets and the expected velocity information of the targets; the strategy solving module is used to calculate the optimal motion strategy of each target in the current frame using an iterative algorithm or other game equilibrium solving algorithm, and output the predicted position of each target in the next frame; the multi-objective game association unit uses the predicted position to match and associate with the next frame detection result provided by the video acquisition and preprocessing unit, and when a conflict match occurs, the association method with the lower cost function is used to complete the determination of the target identity and the update of the trajectory.

[0031] Preferably, the attention calculation module of the self-supervised trajectory guidance unit is implemented by a deep neural network, which uses a Transformer architecture or a spatiotemporal convolutional architecture to extract trajectory sequence features and output attention weight vectors.

[0032] The attention calculation module can perform online adaptive training based on the tracking results during system operation. Specifically, when a systematic deviation is detected between the trajectory prediction point and the actual detection point, the deviation is used as a new training signal to adjust the network weights and gradually correct the trajectory inference strategy.

[0033] Beneficial Effects: The automatic tracking and trajectory generation method and system for moving targets in surveillance videos provided by this invention have the following beneficial effects:

[0034] 1. The present invention, through the active detection, perception and prediction mechanism in step 2, does not simply use linear extrapolation or short-term memory when it detects that an existing target is missing in the current frame. Instead, it constructs a probability field that spreads over time based on the state of the target before it disappears, dynamically models the positional uncertainty of the target during the occlusion period, and combines scene obstacle constraints to actively enhance detection and perception in the high-probability prediction area. This solves the technical problem that existing technologies are prone to directly discarding the trajectory and unable to continuously recover the target trajectory under long-term occlusion or weak detection signals. It achieves the technical effect of continuously outputting complete trajectories in complex occlusion environments and significantly reducing the trajectory interruption rate and missed detection rate in long-term occlusion scenarios.

[0035] 2. This invention models the collision avoidance and yielding behavior between multiple targets as a multi-agent game problem through the multi-target interactive game association mechanism in step 3 and the self-supervised dynamic path guidance mechanism described in step 4. Based on this, it uses a self-supervised trained trajectory attention module to jointly model historical trajectories and scene constraints. This not only allows for reasonable prediction of the movement trends and yielding relationships of each target when they meet at close range or cross occlusion, thus preventing incorrect identity switching, but also solves the technical problem of frequent identity mismatch caused by relying on simple distance or appearance similarity in dense interaction scenarios. Furthermore, it dynamically corrects and smooths trajectory points through adaptive path attention weights, guiding the trajectory to conform to the scene topology and target movement habits. This solves the physical inconsistencies in existing technologies, such as unreasonable trajectory jumps and obstacle crossings. It achieves the technical effect of simultaneously improving the stability of identity maintenance and the physical rationality of trajectory in complex low-light, occlusion, and dense interaction scenarios, and improving the overall robustness and accuracy of multi-target tracking. Attached Figure Description

[0036] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.

[0037] Figure 1 This is a flowchart of the method of the present invention;

[0038] Figure 2A schematic diagram of the probability distribution field for active occlusion prediction;

[0039] Figure 3 This is a schematic diagram illustrating the relationships within a multi-objective interactive game.

[0040] Figure 4 This is a diagram of the overall system. Detailed Implementation

[0041] The present invention will now be described more clearly and completely by way of a preferred embodiment in conjunction with the accompanying drawings, but this does not limit the invention to the scope of the described embodiment.

[0042] like Figure 1 As shown, a method for automatic tracking and trajectory generation of moving targets in surveillance videos specifically includes the following steps:

[0043] Step 1: Obtain the current frame image of the surveillance video sequence, detect moving targets appearing in the frame, and obtain the initial state information of each target; for subsequent frames of the sequence, inherit the state prediction of the existing targets in the previous frame for association and matching.

[0044] Step 2: For the missing target detected in the current frame, perform trajectory prediction based on the motion state of the target before it disappears, simulate the diffusion process of the uncertainty of the target position to estimate its possible occurrence area in the current frame, and actively perform target detection in the possible occurrence area to recover the trajectory of the occluded target;

[0045] Figure 2 This is a schematic diagram illustrating the construction of a probability distribution field by the active detection and perception trajectory prediction mechanism of this invention in a target occlusion scenario. For example... Figure 2 As shown in the diagram, the outer frame represents a monitoring scene. The target disappearance location marked in the center of the scene is the last detected position of a moving target in the previous frame. Multiple concentric dashed circles drawn around this location represent the probability distribution field that spreads over time, characterizing the spatial uncertainty of the target during occlusion. The rectangular areas in the scene are marked as occlusions, representing fixed obstacles that cause the target to temporarily disappear from the field of view. The area behind the occlusions, outlined with dashed lines, is marked as the prediction candidate region. This region is calculated by the active detection and perception trajectory prediction mechanism based on the target's motion state before disappearance, the spread of the probability distribution field, and the layout of the occlusions. It represents the high-probability location where the target might reappear in the current frame after bypassing the occlusions. Within this prediction candidate region, the system lowers the detection threshold or uses enhanced detection to actively detect targets. Once a candidate target is detected in this region, it can be associated with the missing target, and its trajectory can be recovered.

[0046] Preferably, step 2 further includes: when the target existed in the previous frame but was not detected in the current frame, the last location of the target is taken as the initial location, a probability distribution field is constructed around it and diffused over time to simulate the possible location propagation of the target; the diffusion radius of the probability distribution field is increased according to the target's maximum speed prediction, and the probability distribution field is constrained or deflected in combination with environmental obstacle information to predict multiple possible locations of the target after bypassing the obstruction; the system lowers the detection threshold or uses other sensing methods to actively detect the target near the predicted location, and once detected, it is associated with the aforementioned missing target and its trajectory is restored.

[0047] This invention introduces an active detection and trajectory prediction mechanism to address the problem of trajectory prediction and recovery when a target is occluded. Unlike traditional methods that passively wait for the target to reappear, this mechanism actively initiates detection when the target disappears: utilizing the target's state before disappearance, it predicts its possible current location and trajectory by propagating fluctuations in the implicit space. This propagation simulation is similar to the diffusion behavior of seismic waves encountering anomalies in a medium; that is, a dynamically expanding probability wave field is generated centered on the target's disappearance point, based on historical velocity direction and environmental constraints, indicating the possible location of the target. By analyzing this probability field, this mechanism can predict the possible area and time of the target's reappearance after occlusion, thereby actively guiding the detector to strengthen the target search in the possible area and achieving continuous tracking of targets occluded for a long time. This mechanism effectively overcomes the shortcomings of existing technologies that easily lose targets under long-term occlusion.

[0048] Make a certain target disappear at the moment The position is Speed is This mechanism constructs a probability field that diffuses over time based on the target's most recent state. Used to describe the possible location distribution of the target. Initial time. When Set to The Gaussian distribution at the given location is given, and its mean square diffusion radius is set to... It grows linearly over time (simulating wavefront diffusion of uncertainty). For example, the formula can be used:

[0049]

[0050] in Given the uncertainty of the initial position, The maximum possible velocity of the target is used to control the wavefront spread velocity. This indicates the current prediction time. Meanwhile, the center of the probability field follows the velocity... Drift, i.e., desired position Updated over time Indicates the target is The horizontal and vertical coordinates of the last observed position at that time. , This represents the estimated horizontal and vertical velocity components of the target before it disappears, used to describe the target's direction of motion and magnitude of velocity. The resulting... Similar to Center, radius A gradually increasing probability wave. When this wave encounters a static obstacle during propagation, it will be reflected or diffracted. This can be mitigated by introducing an obstacle potential field into the probability field. Simulations (e.g., attenuating the probability of obstacle regions to 0 and generating diffusion side peaks at their edges). Numerical iterative calculations. exist Based on the distribution of time, this mechanism can obtain the target's position in the current frame. possible location set (For example, selecting several high-probability density regions). Then, the system will proactively enhance target detection within these high-probability regions (e.g., lowering the detection threshold or enabling motion information assistance) to detect the reappearance of occluded targets. Once in the region... If a matching target is detected, it is determined to correspond to the disappeared target and its trajectory is restored; if no matching target is detected, the predicted state of the trajectory is retained and tracking continues in the next frame. In this active prediction process, the present invention effectively fills the trajectory gap between the disappearance and reappearance of the target by simulating wave diffusion, thus achieving trajectory continuity maintenance under occlusion conditions.

[0051] Step 3: Treat all targets detected in the current frame and targets recovered through prediction as multiple agents, establish an interactive game model that includes all targets, calculate the motion decision strategy of each target based on the relative position and motion trend between each target, and perform target identity association and trajectory matching based on the game decision results of each target, and update the identity label and state parameters of each target.

[0052] Figure 3 This is a schematic diagram of the multi-objective interactive game association mechanism of the present invention. Figure 3 The rectangle on the left is a schematic of the monitoring screen, which is marked with multiple moving targets such as Target 1, Target 2, and Target 3. The targets are connected by dashed lines to indicate the interaction constraints and relative positional relationships between them. The arrows near the targets indicate the speed direction and strategy tendency of each target. Figure 3The right side represents the multi-objective game association unit, which includes a cost function calculation module and a policy solving module. The cost function calculation module simultaneously considers the deviation of the target from its desired motion and the penalty for collisions or close proximity with other targets, constructing a cost function for each target. The policy solving module solves the equilibrium policy of the multi-agent game or obtains the motion policy of each target in the current frame through iterative optimization algorithms, and predicts the position of each target at the next moment based on the policy. The multi-objective game association unit matches the predicted position with the detection results provided by the video acquisition and preprocessing unit. When multiple targets interact closely or occlude each other, the unit prioritizes determining the identity association based on the yield / avoidance relationship obtained from the game model, thereby maintaining the continuity of the target identities and the correctness of the trajectories.

[0053] Preferably, the multiple target interactive game associations in step 3 include:

[0054] Establish a cost function for each target, which simultaneously considers the degree of deviation of the target from its expected motion as well as the penalty for colliding with or getting too close to other targets;

[0055] The motion strategy of each target in the current frame can be obtained by solving the Nash equilibrium of the multi-agent game or by using an iterative optimization algorithm.

[0056] Predict the position of each target in the next moment based on the motion strategy, and use the predicted position to assist in target identity matching in the current frame;

[0057] When multiple targets interact at close range, the association is determined first based on the yield / avoidance relationship obtained from the game theory model, thereby maintaining the consistency of the identities of each target.

[0058] This invention introduces a multi-objective interactive game behavior modeling mechanism to handle complex interactions between multiple objectives, thereby maintaining the continuity of objective identities and correct decision-making. Each moving objective in the scene is abstracted as an agent, and a multi-agent game model is established for their movement decisions. In this model, each agent attempts to optimize its own payoff or cost function, such as wanting to move along the shortest path while avoiding collisions and maintaining a certain social distance. Collision avoidance and competition relationships between objectives are modeled as a set of game rules, and all agents iteratively update their strategies to eventually reach a Nash equilibrium or cooperative strategy equilibrium. Through this game model, this mechanism can predict the avoidance behavior and yielding decisions of objectives in interactive scenarios, thus maintaining identity continuity based on their respective behavioral logic even when objectives intersect or occlude. For example, when two similar-looking objectives meet at close range, traditional algorithms may mismatch identities based on appearance or trajectory distance, while the game model will predict the interaction results (such as who yields, their respective turning directions), thereby distinguishing and maintaining their respective trajectories. This mechanism addresses the shortcomings of existing methods such as TrackFormer, which do not explicitly model the interaction decisions between objectives.

[0059] First, each target is abstracted as an agent i, and its state is defined as follows: (Including position, velocity, etc.) and policy variables (Such as acceleration or steering decisions). To model the interaction behavior between agents, a cost function is defined for each agent. ,For example:

[0060]

[0061] in Let be the desired travel speed of agent i (which can be estimated from historical trajectory or destination direction). and Let i and j be the positions of agents i and j. These are weighting parameters. The first term encourages the agent to maintain its predetermined direction and speed, while the second term makes the cost tend to be infinitely large when the agent is too close (to avoid collisions). Indicates the length of the time window for prediction or decision-making, used to set the time scale for game optimization. This represents the time variable during the integration process. All agents attempt to choose a policy. To minimize its own cost In the ideal case (continuous-time game), this forms a differential game problem, which can be solved by finding the corresponding Nash equilibrium conditions. Obtain the optimal response strategy for each agent. (It satisfies any) , make Minimum and (Together satisfying equilibrium). In the implementation of this invention, we employ discrete approximation and iterative algorithms: for the target state set of the current frame... Initialize the strategy for each target. To follow pure inertial motion, and then iteratively update each To reduce their respective costs, until the changes converge. A convergent strategy combination. It is used to predict the position of each target in the next frame and to assist in the identity association in the current frame.

[0062] Specifically, in terms of identity association, we utilize the results of a game theory model to optimize data association matching: on the one hand, based on the predicted positions of each target at the next moment obtained from policy deduction... This method can match the actual detected position in the next frame, improving matching accuracy. On the other hand, for multiple targets in close proximity in the current frame, traditional nearest neighbor or Hungarian matching may result in misjudgments. However, by observing their game strategies (e.g., which one slows down to avoid the other, and which one accelerates to overtake), we can infer the reasonable correspondence of motion trajectories, thereby correcting errors in pure appearance / distance matching. This is similar to how, when two targets intersect, this mechanism determines identity continuity based on the game outcome of who yields to whom. For example, if IDA slows down to yield according to its strategy, while IDB maintains its speed, then after the intersection, the one moving further forward should be IDB, not IDA, avoiding the erroneous association of their identities being swapped. By incorporating the game model into the association algorithm, this invention significantly reduces ID switching errors in multi-target close-range interaction scenarios, ensuring consistent and reliable trajectories.

[0063] Step 4: Smooth and extend the updated target trajectories. In the trajectory generation process, a self-supervised learning attention mechanism is introduced. The path attention weight is adaptively allocated according to the target's historical trajectory and scene constraints to guide the trajectory points to converge to the high weight region, so as to generate a smooth and continuous trajectory that conforms to the target's real movement trend.

[0064] Step 4 specifically includes: providing a trajectory attention model that has been pre-trained in an unsupervised or self-supervised manner; analyzing the historical trajectory sequence of each target to generate the attention weight distribution of future trajectory points; adjusting the original predicted trajectory based on the attention weights, increasing the weight or offset of trajectory points in high-weight directions, and weakening the trajectory offset in low-weight directions, so that the final output trajectory satisfies the smoothness while conforming to scene constraints and target motion laws; wherein, the training of the trajectory attention model utilizes segments of unlabeled video data where the target disappears and reappears or is partially occluded, and iteratively adjusting the trajectory attention model parameters by having the trajectory attention model predict the motion position during the occlusion and compare it with the actual reappearance position, thereby enabling the trajectory attention model to self-correct the trajectory in online tracking.

[0065] This invention introduces a self-supervised dynamic path guidance mechanism that integrates the concepts of self-supervised learning and attention guidance to dynamically focus on the target's motion path during trajectory generation. Specifically, a self-supervised training attention module is introduced into the tracking model. This module does not require manual annotation and learns its ability to focus on key points of the trajectory by mining historical trajectories and scene information. During online tracking, the attention module generates a dynamic path weight distribution based on the current target state and environmental characteristics, highlighting areas the predicted trajectory may traverse, thus guiding the algorithm to concentrate computational resources on these high-probability paths. This path guidance mechanism is equivalent to providing a time-adjusting spotlight for the tracking process, always focusing on the most likely motion path and actively ignoring irrelevant interference. Through self-supervised training, the model learns how to adjust attention weights based on past trajectory patterns, for example, tightening the focus area when the target's movement is restricted (such as in corridors or roads) and appropriately expanding the focus area when the target may suddenly turn or jump. This mechanism improves the accuracy and smoothness of trajectory prediction and reduces the algorithm's dependence on external parameter adjustments, demonstrating the ability to overcome the inspiration of existing technologies and make autonomous improvements.

[0066] Trajectory generation can employ conventional linear interpolation or higher-order motion models to subdivide the target's motion between adjacent frames, resulting in a smooth trajectory. However, a path-guided attention module is introduced to adaptively adjust the trajectory direction: this module consists of a spatiotemporal attention network, pre-trained on massive amounts of unsupervised video data to learn common trajectory patterns. For each target, its historical trajectory... and current state Given the given conditions, the attention module calculates the future. Importance weights of each location point within a time period This represents a set of candidate path points. These weights reflect the model's focus on the possible subsequent paths of the target. For example, in a corridor scene, attention might be concentrated on a line along the corridor; while in an open area, attention might be dispersed across multiple directions the target might be facing. The algorithm then dynamically guides the trajectory accordingly: increasing the confidence of trajectory points in high-weight directions, appropriately stretching the trajectory to follow the focus of attention; and smoothing or filtering out low-weight deviation points. Formally, this can be represented as a sequence of original predicted trajectory points. Make corrections:

[0067]

[0068] in This represents the position of the trajectory point at time t+k after attention guidance and correction; it is the final output trajectory point coordinate. This represents the position of the original predicted trajectory point at time t+k based on the basic motion model (such as the linear velocity model, game prediction results, etc.). This indicates that attention shifts are applied to the trajectory points. Small adjustments are applied (vectors pointing to high-weight regions). The attention weight coefficients are assigned to the trajectory points at time t+k by the attention module. K represents the number of time steps relative to the current frame, and K represents the upper limit of the number of trajectory points predicted or smoothed forward. Through this dynamic adjustment, the final output trajectory will better match the actual movement trend of the target and filter out random noise and anomalous jumps to the maximum extent. It is worth mentioning that the training of the attention module is self-supervised: for example, using segments of the target partially occluded in the training data, the module is required to predict the location of the target after occlusion based on the trajectory before occlusion, compare the error with the actual location, and backpropagate for optimization; or using segments that disappear and then reappear outside the camera's field of view, the module is trained to predict the possible range of motion during the disappearance period. This training enables the module to self-improve without explicit supervision, thereby improving the model's generalization performance. In online applications, the module can also continuously adapt: when a deviation is found between the trajectory prediction and the actual detection, the system can use this as a new training sample to fine-tune the attention network, gradually adapting it to the characteristics of the current scene. This self-supervised adjustment ensures that the trajectory guidance mechanism can function effectively in various environments.

[0069] Step 5: Output and store the identities and trajectory coordinates of each target confirmed in the current frame, process subsequent frames in a loop, and continuously apply steps 2-4 throughout the entire video sequence to achieve automatic continuous tracking of moving targets and generation of complete trajectories.

[0070] After completing the above steps, the system stores the target status confirmed in the current frame and the updated trajectory, and then enters the next frame loop. In the output stage, this invention can provide a globally unique identity ID and a time-varying trajectory coordinate sequence for each target. The trajectory can be used for subsequent applications such as behavior analysis, velocity statistics, and anomaly detection.

[0071] like Figure 4 As shown, an automatic tracking and trajectory generation system for moving targets in surveillance videos is used to implement the automatic tracking and trajectory generation method for moving targets in surveillance videos as described above. The system includes a video acquisition and preprocessing unit, an active trajectory prediction unit, a multi-objective game association unit, a self-supervised trajectory guidance unit, and a result output unit; wherein:

[0072] The video acquisition and preprocessing unit is used to acquire video frame sequences from the surveillance camera and perform target detection on each frame to obtain initial detection information for multiple targets.

[0073] The active trajectory prediction unit is used to calculate the predicted location region of the target in subsequent frames based on the state of the most recent appearance of the missing target when the target is detected to be missing. The active trajectory prediction unit is configured to use a wave diffusion model to simulate the propagation of the uncertainty of the target position and control the camera or detection algorithm to actively perceive the predicted region, thereby recapture the temporarily lost target.

[0074] The multi-objective game association unit is used to receive detection information provided by the video acquisition and preprocessing unit and predicted position provided by the active trajectory prediction unit, and to perform identity association and state update for all targets in the current frame. The multi-objective game association unit integrates a game model, which determines the motion trend of each target by solving the optimal decision of multiple agents, and performs target-trajectory matching and identity preservation accordingly. When targets occlude or interact with each other, the correct trajectory correspondence is maintained first based on the game model result.

[0075] The self-supervised trajectory guidance unit is used to smooth and optimize the trajectories of each target after target association is completed. The self-supervised trajectory guidance unit includes an attention calculation module, which uses the target's historical trajectory and scene information to generate attention weights for future trajectory points and adjusts the position of the trajectory points to fit the high-weight area, thereby achieving dynamic guidance of the trajectory. This unit does not require manual calibration and continuously updates the attention model through self-supervised learning to adapt to environmental changes, thereby improving the accuracy of trajectory prediction.

[0076] The result output unit is used to output and record the identity of each tracked target and its continuous motion trajectory data. The continuous motion trajectory data is jointly determined by the multi-objective game association unit and the self-supervised trajectory guidance unit.

[0077] Preferably, the active trajectory prediction unit is configured to: acquire the position and motion vector of the target at the moment the target disappears, calculate the possible diffusion range of the target position using a preset maximum velocity parameter, and combine the pre-stored scene obstacle layout information to crop or guide the diffusion range to obtain several candidate regions; the active trajectory prediction unit further includes a control module, which is used to control the video acquisition and preprocessing unit to perform key detection on the candidate regions or adjust the sensitivity of the detection algorithm to achieve target re-identification and trajectory recovery.

[0078] Preferably, the multi-objective game association unit includes: a cost function calculation module and a strategy solving module; the cost function calculation module is used to construct a cost function model for each target based on the distance between targets and the expected velocity information of the targets; the strategy solving module is used to calculate the optimal motion strategy of each target in the current frame using an iterative algorithm or other game equilibrium solving algorithm, and output the predicted position of each target in the next frame; the multi-objective game association unit uses the predicted position to match and associate with the next frame detection result provided by the video acquisition and preprocessing unit, and when a conflict match occurs, the association method with the lower cost function is used, thereby completing the determination of the target identity and the update of the trajectory.

[0079] Preferably, the attention calculation module of the self-supervised trajectory guidance unit is implemented by a deep neural network, which adopts a Transformer architecture or a spatiotemporal convolutional architecture to extract trajectory sequence features and output attention weight vectors;

[0080] The attention calculation module can perform online adaptive training based on the tracking results during system operation. Specifically, when a systematic deviation is detected between the trajectory prediction point and the actual detection point, the deviation is used as a new training signal to adjust the network weights, thereby gradually correcting the trajectory inference strategy.

[0081] Example

[0082] The following example, a complex scenario of a low-light underground parking lot, illustrates the working process and advantages of this invention. Underground parking lots are typically poorly lit, riddled with obstructions such as pillars and vehicles, and pedestrian movement paths are irregular, posing a significant challenge to traditional tracking algorithms. In this scenario, the various mechanisms of this invention work collaboratively: when a pedestrian suddenly disappears from the camera's view after walking behind a parking pillar, the active detection mechanism is immediately activated—it treats the pedestrian's previous position as a wave source, simulating the propagation of waves in the parking lot's passageway, thereby predicting the time and location at which the pedestrian might reappear from the other side of the pillar or a nearby area. Because the dim lighting reduces the detector's confidence, this invention utilizes the predicted area to improve detection sensitivity, ultimately successfully recapturing the pedestrian's position on the other side of the pillar, achieving seamless trajectory connection. Simultaneously, when two pedestrians meet face-to-face in a narrow parking lot passageway, the game-theoretic model comes into play: the model predicts that both will gravitate to one side based on courtesy rules, so even if their bodies obscure each other at the moment of intersection, the system can maintain their IDs based on their previous movement intentions, avoiding identity confusion at the intersection. Finally, the self-supervised path guidance module learns common walking routes (straight along the passageway, turning at intersections, etc.) based on the parking lot passageway structure. Therefore, when a pedestrian is briefly lost in a dimly lit corner, the module focuses its attention on reasonable corner paths rather than extending the trajectory into areas where it is impossible to pass through walls. This attention constraint ensures that trajectory inference remains reliable even when the detection signal is unstable, without producing unreasonable trajectory points that significantly deviate from walls or vehicles. Through the above synergistic effects, this invention achieves stable multi-target tracking in this extreme scenario, significantly outperforming traditional solutions that do not employ these mechanisms. For example, compared to the ordinary Kalman filter prediction + IoU matching algorithm, our trajectory interruption rate and ID switching rate are significantly reduced under conditions of prolonged occlusion and low light, fully demonstrating the significant progress of this invention in complex environments.

[0083] It should be noted that this invention is not limited to the aforementioned underground parking lot scenario. The core mechanism described above is equally applicable and effective in other situations, such as dense crowds, blind spots in camera views, and seamless multi-camera tracking. For example, in monitoring densely populated commercial streets, the active detection mechanism can address frequent pedestrian occlusion, the game theory model can characterize the social behavior of group walking, and self-supervised attention can learn typical pedestrian flow patterns, thus effectively reducing missed detections and identity confusion. This invention can also be extended to multi-camera joint tracking: when a target moves out of the field of view from one camera, the fluctuation prediction of this invention can project its possible next appearance area onto the viewpoint of adjacent cameras, thereby achieving trajectory relay between cameras. Furthermore, extending the game theory model to a multi-agent game across cameras can address the issue of consistent target identity across different viewpoints.

[0084] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for automatic tracking and trajectory generation of moving targets in surveillance videos, characterized in that, Specifically, the following steps are included: Step 1: Obtain the image of the current frame of the surveillance video sequence, detect the moving targets appearing in the current frame, and obtain the initial state information of each target; For subsequent frames in the sequence, the state prediction of the target already existing in the previous frame is inherited for association and matching. Step 2: For the missing target detected in the current frame, perform trajectory prediction based on the motion state of the missing target before it disappears, simulate the diffusion process of the uncertainty of the target position to estimate its possible occurrence area in the current frame, and actively perform target detection in the possible occurrence area to recover the trajectory of the occluded target; Step 3: Treat all targets detected in the current frame and targets recovered through prediction as multiple agents, establish an interactive game model that includes all targets, calculate the motion decision strategy of each target based on the relative position and motion trend between each target, and perform target identity association and trajectory matching based on the game decision results of each target, and update the identity label and state parameters of each target. Step 4: Smooth and extend the updated target trajectories. In the trajectory generation process, introduce a self-supervised learning attention mechanism to adaptively allocate path attention weights based on the target's historical trajectory and scene constraints, guide trajectory points to converge towards high-weight regions, and generate smooth and continuous trajectories that conform to the target's true movement trend. Step 5: Output and store the identities and trajectory coordinates of each target confirmed in the current frame, process subsequent frames in a loop, and continuously apply the automatic continuous tracking and complete trajectory generation of moving targets from Steps 2-4 throughout the entire video sequence; The multiple target interactive game relationships in step 3 include: Establish a cost function for each target, which simultaneously considers the degree of deviation of the target from its expected motion as well as the penalty for colliding with or getting too close to other targets; The motion strategy of each target in the current frame can be obtained by solving the Nash equilibrium of the multi-agent game or by using an iterative optimization algorithm. Predict the position of each target in the next moment based on the motion strategy, and use the position to assist in target identity matching in the current frame; When multiple targets interact at close range, the association is determined first based on the yield / avoidance relationship obtained from the game theory model, so as to maintain the consistency of the identities of each target.

2. The method for automatic tracking and trajectory generation of moving targets in surveillance video according to claim 1, characterized in that, Step 2 further includes: when a target exists in the previous frame but is not detected in the current frame, the last location of the target is used as the initial location, a probability distribution field is constructed around it and diffuses over time to simulate the possible location propagation of the target; the diffusion radius of the probability distribution field is increased according to the target's maximum speed prediction, and the probability distribution field is constrained or deflected in combination with environmental obstacle information to predict multiple possible locations of the target after bypassing the obstruction; the system lowers the detection threshold or uses other sensing methods to actively detect the target near the predicted location, and once detected, it is associated with the aforementioned missing target and its trajectory is restored.

3. The method for automatic tracking and trajectory generation of moving targets in surveillance video according to claim 1, characterized in that, Step 4 specifically includes: providing a trajectory attention model that has been pre-trained in an unsupervised or self-supervised manner; analyzing the historical trajectory sequence of each target to generate the attention weight distribution of future trajectory points; adjusting the original predicted trajectory based on the attention weights, increasing the weight or offset of trajectory points in high-weight directions, and weakening the trajectory offset in low-weight directions; the final output trajectory satisfies smoothness while conforming to scene constraints and target motion patterns; wherein, the training of the trajectory attention model utilizes segments of unlabeled video data where the target disappears and reappears or is partially occluded, and iteratively adjusting the trajectory attention model parameters by having the trajectory attention model predict the motion position during the occlusion and comparing it with the actual reappearance position.

4. A system for automatic tracking and trajectory generation of moving targets in surveillance videos, characterized in that, A method for automatically tracking and generating trajectories of moving targets in surveillance videos as described in any one of claims 1-3, comprising a video acquisition and preprocessing unit, an active trajectory prediction unit, a multi-objective game association unit, a self-supervised trajectory guidance unit, and a result output unit; wherein: The video acquisition and preprocessing unit is used to acquire video frame sequences from the surveillance camera and perform target detection on each frame to obtain initial detection information for multiple targets. The active trajectory prediction unit is used to calculate the predicted location region of the missing target in subsequent frames based on the state of the most recent appearance of the missing target when the target is detected to be missing. The active trajectory prediction unit is configured to use a wave diffusion model to simulate the propagation of the uncertainty of the target position, and control the camera or detection algorithm to actively perceive the predicted region and recapture the temporarily lost target. The multi-objective game association unit is used to receive detection information provided by the video acquisition and preprocessing unit and predicted position provided by the active trajectory prediction unit, and to perform identity association and state update for all targets in the current frame. The multi-objective game association unit integrates a game model, which determines the motion trend of each target by solving the optimal decision of multiple agents, and performs target-trajectory matching and identity preservation accordingly. When targets occlude or interact with each other, the correct trajectory correspondence is maintained first based on the game model result. The self-supervised trajectory guidance unit is used to smooth and optimize the trajectories of each target after target association is completed. The self-supervised trajectory guidance unit includes an attention calculation module, which uses the target's historical trajectory and scene information to generate attention weights for future trajectory points and adjusts the position of the trajectory points to fit the high-weight area, thereby achieving dynamic guidance of the trajectory. This unit does not require manual calibration and continuously updates the attention model through self-supervised learning to adapt to environmental changes and improve the accuracy of trajectory prediction. The result output unit is used to output and record the identity of each tracked target and its continuous motion trajectory data. The continuous motion trajectory data is jointly determined by the multi-objective game association unit and the self-supervised trajectory guidance unit.

5. The automatic tracking and trajectory generation system for moving targets in surveillance videos according to claim 4, characterized in that, The active trajectory prediction unit is configured to: acquire the position and motion vector of the target at the moment the target disappears, calculate the possible diffusion range of the target position using the preset maximum velocity parameter, and combine the pre-stored scene obstacle layout information to crop or guide the diffusion range to obtain several candidate regions; the active trajectory prediction unit further includes a control module, which is used to control the video acquisition and preprocessing unit to perform key detection on the candidate regions or adjust the sensitivity of the detection algorithm.

6. The automatic tracking and trajectory generation system for moving targets in surveillance videos according to claim 4, characterized in that, The multi-objective game association unit includes a cost function calculation module and a strategy solving module. The cost function calculation module is used to construct a cost function model for each target based on the distance between targets and the expected velocity information of the targets. The strategy solving module is used to calculate the optimal motion strategy of each target in the current frame using an iterative algorithm or other game equilibrium solving algorithm, and output the predicted position of each target in the next frame. The multi-objective game association unit uses the predicted position to match and associate with the next frame detection result provided by the video acquisition and preprocessing unit. When a conflict match occurs, the association method with the lower cost function is used to complete the determination of the target identity and the update of the trajectory.

7. The automatic tracking and trajectory generation system for moving targets in surveillance videos according to claim 4, characterized in that, The attention calculation module of the self-supervised trajectory guidance unit is implemented by a deep neural network. The deep neural network uses a Transformer architecture or a spatiotemporal convolutional architecture to extract trajectory sequence features and output attention weight vectors. The attention calculation module can perform online adaptive training based on the tracking results during system operation. Specifically, when a systematic deviation is detected between the trajectory prediction point and the actual detection point, the deviation is used as a new training signal to adjust the network weights and gradually correct the trajectory inference strategy.