Multi-target tracking method, storage medium, and electronic device
By combining deep neural networks and particle filtering, the accuracy problem of multi-target tracking in complex scenarios has been solved, achieving stable multi-target tracking and improving its application effects in fields such as traffic safety, virtual reality, and autonomous driving.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CASCO SIGNAL LTD
- Filing Date
- 2025-10-28
- Publication Date
- 2026-07-02
AI Technical Summary
In complex scenarios, multi-target tracking technology faces challenges such as missing target features, confusion between objects and backgrounds in low-resolution videos, and changes in lighting under adverse weather conditions. These challenges increase tracking accuracy and detection difficulty, limiting its application in fields such as traffic safety, virtual reality, and autonomous driving.
We employ deep neural networks to extract target features, combined with particle filtering methods. By creating and updating particle filters, we handle newly emerging and disappearing targets. We use a deep appearance feature extraction network to calculate particle weights, perform multi-level matching and target state management, and improve tracking accuracy.
It significantly improves target tracking accuracy in complex scenarios, adapts to dynamic environmental changes, and achieves stable multi-target tracking, which can be applied to fields such as passenger flow analysis, behavior modeling, and autonomous driving.
Smart Images

Figure CN2025130398_02072026_PF_FP_ABST
Abstract
Description
A multi-target tracking method, storage medium, and electronic device Technical Field
[0001] This invention relates to the field of computer vision technology, and in particular to a multi-target tracking method, storage medium, and electronic device. Background Technology
[0002] In today's era, object tracking has become one of the most prominent research hotspots in the field of computer vision. With the rapid development of artificial intelligence technology, continuous improvement in hardware performance, and the rapid rise of deep network technology, the performance of object tracking in practical applications has been greatly enhanced. Related research can now tailor optimized algorithms and model structures for specific scenarios. Currently, multi-object tracking technology has gradually penetrated into many fields such as traffic safety, virtual reality, and autonomous driving.
[0003] In the field of traffic safety, given the ever-increasing number of cars and the growing pressure on road transport, the application of multi-target tracking technology provides an effective means of monitoring vehicle violations. Multi-target tracking technology can accurately capture the driving status of each vehicle and promptly detect violations. In the field of virtual reality research, virtual scenes often encompass multiple human or vehicle entities. With the help of target tracking technology, each entity can be given a certain degree of autonomous decision-making ability, making its behavior more intelligent and realistic. In the field of autonomous driving, although achieving L4 or even L5 level highly automated driving still faces many challenges, applications such as Tesla's L2 level autonomous vehicles and fixed-route autonomous trucks have already shown people the dawn of autonomous driving technology becoming a reality. In this process, multi-target tracking technology plays a key role, helping vehicles identify surrounding traffic participants and ensuring driving safety.
[0004] However, the field of multi-object tracking research still faces numerous challenging problems. For example, in complex background scenes, various objects within the field of view frequently occlude each other, resulting in missing target features and significantly interfering with tracking accuracy. Furthermore, in low-resolution videos, people or objects in distant views are easily confused with the background, adding considerable difficulty to target detection tasks. In addition, varying lighting conditions due to day-night cycles and adverse weather conditions such as rain and snow constantly test the performance of multi-object tracking algorithms, becoming bottlenecks restricting the further development of multi-object tracking technology. Summary of the Invention
[0005] The purpose of this invention is to provide a multi-target tracking method, storage medium, and electronic device, which have the advantage of strong multi-target tracking capability.
[0006] To achieve the above objectives, the present invention provides a multi-target tracking method, comprising:
[0007] S10. Obtain the image information of the current frame, extract the target information to be matched in the current frame, match the target information to be matched with the existing targets extracted from past frames, and obtain the successfully matched targets and the unmatched targets.
[0008] S20. For targets that fail to match, determine whether they are newly created targets or disappeared targets;
[0009] S30. For newly formed targets, create a particle filter to describe the newly formed targets; for disappearing targets, delete the existing particle filter describing the disappearing targets.
[0010] S40, Output trajectory tracking results.
[0011] Optionally, step S10 includes:
[0012] S101. Create separate motion models for each particle. In the formula, The coordinates of the center point of the bounding box (bbox) The movement speed of the bounding box (bbox). Here are the length and width of the bounding box; here are the length and width of the bounding box.
[0013] S102. Maintain the motion state of each particle and update the particle motion state when the next posterior knowledge arrives. The kinematic model of the particle is represented as: In the formula, This represents the state of the system at time k. This represents the system's observation at time k. and Let these represent the system's state transition function and the system's measurement function, respectively. and Let represent the system noise and observation noise at time k, respectively;
[0014] S103. Based on the observations and motion states from time 1 to k-1, predict the motion state of the system at time k, i.e., from... get Given Based on the relevant system assumptions, the state and If they are mutually independent, we can obtain: For the above formula By performing integration, the corresponding credibility representation is finally obtained: ;
[0015] S104. Update the system observations at time k, i.e., by... get The process, where the predicted value at time k is known. We can obtain the posterior probability: ,in In practical calculations, it can be taken as a normalization constant. The specific calculation process is as follows: ;
[0016] S105. A deep appearance feature extraction network for particle filtering is constructed using a deep neural network structure of deep over-parameterized convolutional layer (DO-Conv). The appearance features extracted by the deep appearance feature extraction network are used to calculate the weights of the particles and affect the subsequent importance ranking.
[0017] S106. Calculate the similarity between particle i and the input observation. .
[0018] Optionally, the input image size of the deep appearance feature extraction network is 128×64, and a 128-dimensional feature vector is finally obtained through the setting of the network layer structure.
[0019] Alternatively, in S106, according to formula The similarity between particle i and the input observation is obtained. In the formula The depth appearance feature vector is the input observation. This is the feature vector corresponding to a certain particle in the particle filter system at the current moment.
[0020] Optionally, step S20 is detailed as follows:
[0021] S201, Based on the aforementioned similarity To determine whether the observed values of the particles are reasonable;
[0022] S202. If the observed value of the particle is unreasonable, the predicted value of the particle is used to perform state transition on the motion state.
[0023] Optionally, step S20 further includes S203, setting a target state maintenance model, adding auxiliary variables for each target to record the existence of the target, wherein the target state maintenance model is: In the formula, Let t be the number of states of the tracking system. This represents the state of the j-th target at time t; express The state of existence at time t; where, when When, it means Existence; when When, it means The number of targets at time t does not exist; the number of targets at time t can be determined by... Calculated.
[0024] Optionally, step S203 further includes: using Represents all observations of the tracking system up to time t, where In the formula For the i-th observation at time k, Let k be the number of observations at time k, given the known observations. At that time, the state of each target can be inferred using particle filtering.
[0025] Optionally, in step S201, the judgment index b is calculated, and the judgment index b is calculated according to the following formula: When b is greater than the threshold At that time, the current predicted value is considered reasonable.
[0026] Optionally, in step S202, the state transition using the predicted value of the particle includes: In the formula Indicates the state of motion of particles. This represents the normalized weight of the particle.
[0027] Optionally, in step S30, for a newly formed target, creating a particle filter to describe the newly formed target includes: determining whether the unmatched targets are continuously present in... If a match is found in a frame of video, it is determined to be a new target, and a particle filter is created to describe the new target.
[0028] Optionally, in step S30, for a vanishing target, deleting the existing particle filter describing the vanishing target includes: determining whether the unmatched targets are continuously present in... If a match fails in a frame of video, the target is determined to have disappeared, and the particle filter used to describe the disappeared target is deleted.
[0029] Optionally, after step S10, the method further includes step S11: for a successfully matched target, update the particle filter state of the successfully matched target, and proceed to S40.
[0030] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, it implements the multi-target tracking method as described above.
[0031] The present invention also provides an electronic device, including a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, it implements the multi-target tracking method as described above.
[0032] In summary, compared with the prior art, the multi-target tracking method, storage medium, and electronic device provided by the present invention have the following beneficial effects:
[0033] The multi-target tracking method of this invention employs a deep neural network to extract target features, performs similarity calculations on the corresponding target depth features, and utilizes a multi-level matching approach to improve the accuracy of target matching, resulting in a more effective registration model. This multi-target tracking method can significantly improve the matching accuracy between detection results and existing targets during target tracking and trajectory generation in complex scenarios, playing an important role in fields such as passenger flow analysis, behavior modeling, and autonomous driving. Attached Figure Description
[0034] Figure 1 is an overall flowchart of the multi-target tracking method of the present invention.
[0035] Figure 2 is a flowchart of the particle filtering method in multi-target tracking.
[0036] Figure 3 is a schematic diagram of the deep neural network residual block in this invention.
[0037] Figure 4 is a schematic diagram of the feature extraction network of the present invention.
[0038] Figure 5 is a schematic diagram of the target state maintenance process in the multi-target tracking method.
[0039] Figure 6 is a schematic diagram of the new target judgment process in the multi-target tracking method.
[0040] Figure 7 is a schematic diagram of the disappearance target judgment process in the multi-target tracking method. Detailed Implementation
[0041] The technical solutions, structural features, achieved objectives, and effects of the present invention will be described in detail below with reference to Figures 1 to 7 in the embodiments of the present invention.
[0042] It should be noted that the accompanying drawings are in a very simplified form and use non-precise proportions. They are only used to facilitate and clarify the purpose of illustrating the embodiments of the present invention, and are not intended to limit the implementation conditions of the present invention. Therefore, they have no substantial technical significance. Any modifications to the structure, changes in the proportional relationship, or adjustments to the size should still fall within the scope of the technical content disclosed in the present invention, provided that they do not affect the effects and objectives that the present invention can produce.
[0043] It should be noted that, in this invention, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only the expressly listed elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus.
[0044] As shown in Figure 1, the present invention provides a multi-target tracking method, including:
[0045] S10. Obtain the image information of the current frame, extract the target information to be matched in the current frame, and match the target information to be matched with the existing targets extracted from previous frames to obtain successfully matched targets and unmatched targets. In this step, the target information to be matched is extracted from the image information of the current frame, and the target information to be matched is matched with the existing targets extracted from previous frames to obtain successfully matched targets and unmatched targets. The successfully matched targets are added to the matched trajectory set, and the unmatched targets are added to the unmatched detection target set.
[0046] S20. For targets that fail to match, use the trajectory maintenance module to determine whether they are newly created targets or disappeared targets.
[0047] S30. For newly formed targets, create a particle filter to describe the newly formed targets; for disappearing targets, delete the existing particle filter that describes the disappearing targets.
[0048] S40, Output trajectory tracking results.
[0049] As shown in Figure 2, in this embodiment, step S10 specifically includes:
[0050] S101. Establish a unified motion model for the samples in the particle filtering method. Considering the scale changes of the target in the actual scene, it is necessary to add length and width attributes to the traditional bounding box four-dimensional representation method to describe the particle attributes in more detail. Create separate motion models for each particle. In the formula, The coordinates of the center point of the bounding box (bbox) The movement speed of the bounding box (bbox). The bounding box of the bounding box is defined by its length and width.
[0051] S102. Maintain the motion state of each particle and update the particle motion state when the next posterior knowledge arrives. The kinematic model of the particle is represented as: In the formula, This represents the state of the system at time k. This represents the system's observation at time k. and Let these represent the system's state transition function and the system's measurement function, respectively. and Let represent the system noise and observation noise at time k, respectively;
[0052] S103. Based on the observations and motion states from time 1 to k-1, predict the motion state of the system at time k, i.e., from... get Given Based on the relevant system assumptions, the state and If they are mutually independent, we can obtain: For the above formula By performing integration, the corresponding credibility representation is finally obtained: ;
[0053] S104. Update the system observations at time k, i.e., by... get The process, where the predicted value at time k is known. We can obtain the posterior probability: ,in In practical calculations, it can be taken as a normalization constant. The specific calculation process is as follows: ;
[0054] S105. To better extract target appearance features, a deep appearance feature extraction network for particle filtering is constructed using a deep neural network structure with a deep over-parameterized convolutional layer (DO-Conv). The residual block of the DO-Conv convolutional layer is shown in Figure 3. The appearance features extracted by the deep appearance feature extraction network are used to calculate the particle weights and influence the subsequent importance ranking.
[0055] The feature extraction network used in this invention has the specific network structure shown in Figure 4. The input image size of the network is set to 128×64, and a 128-dimensional feature vector is finally obtained through the setting of subsequent network layer structures.
[0056] S106. Calculate the similarity between particle i and the input observation. Specifically, based on the reparameterization of deep appearance feature weight calculation, during the importance sampling process of particle filtering, the particle filtering system calculates the similarity between the currently maintained particle attributes and the system's input observations to obtain the weights of the corresponding particles. To incorporate the influence of deep features into parameter calculation, a DO-Conv network model is used as the basic structure. In the importance sampling step, the model first samples the original video frames based on the particle's motion state, then passes the extracted feature vectors to the weight calculation module for similarity calculation, and finally calculates and updates the particle weights using a normalization method, according to the formula... The similarity between particle i and the input observation is obtained. In the formula The depth appearance feature vector is the input observation. This is the feature vector corresponding to a particle in the particle filter system at the current time. The final result is... This is the similarity between particle i and the input observation, which can be used as the main basis for subsequent weight calculation.
[0057] In this embodiment, step S20 specifically includes:
[0058] S201, Based on similarity To determine whether the observed values of a particle are reasonable, the method for determining the reasonableness of the observed values is to calculate the judgment index b, which is obtained by the following formula: When b is greater than the threshold At that time, the current predicted value is considered reasonable.
[0059] S202. If the observed value of a particle is unreasonable, the predicted value of the particle is used to perform a state transition on the motion state, thereby generating a new particle around the original position as an adaptive result for the next operation, until the next observed value arrives or the particle filter is deleted by the system. Therefore, using particle filtering combined with the output of the target registration module as the observed value can realize the function of maintaining the target trajectory state. When the observed value is considered unreasonable, the state transition using the predicted value of the particle includes: In the formula Indicates the state of motion of particles. This represents the normalized weight of the particle.
[0060] Step S20 also includes S203, setting up a target state maintenance model, adding auxiliary variables for each target to record the existence of the target, and the target state maintenance model is as follows: In the formula, Let t be the number of states of the tracking system. This represents the state of the j-th target at time t; express The state of existence at time t; where, when When, it means Existence; when When, it means The number of targets at time t does not exist; the number of targets at time t can be determined by... Calculated.
[0061] Step S203 also includes: using Represents all observations of the tracking system up to time t, where In the formula For the i-th observation at time k, Let k be the number of observations at time k, given the known observations. At that time, the state of each target can be inferred using particle filtering.
[0062] In step S30, for newly formed targets as shown in Figure 6, creating a particle filter to describe the newly formed targets includes: determining whether unmatched targets are continuously present in... If a match is found in a frame of video, it is determined to be a new target, and a particle filter is created to describe the new target.
[0063] For the new target, as shown in D2 in Figure 5, the state corresponding to the adjacent time step D2 is: At this point, the target is considered a newly emerging target to be confirmed, and the states of targets D3 and D4 at adjacent time points are respectively as well as ,exist Since D2 has been continuously at The match was successful in the video frame, therefore the target is confirmed as a new target and can be used as a normal matching target in the future.
[0064] In step S30, for the disappearing target as shown in Figure 7, deleting the existing particle filter describing the disappearing target includes: determining whether the unmatched targets are continuous in... If a match fails in a frame of video, the target is determined to have disappeared, and the particle filter used to describe the disappeared target is deleted.
[0065] For disappearing targets, as shown in A1, A2, and A3 in Figure 5, each target that has been attempted to match corresponds to a certain state at each time step. This state changes based on the input of the next sequence. The states of A1 at adjacent time steps in Figure 5 are... At this point, the target is considered a target to be confirmed as missing. The states of targets A2 and A3 at adjacent time points are respectively as well as At this point, since target A1 has been continuously in The match failed in the video frame, so the target was confirmed as a missing target and the deletion operation was performed.
[0066] As a preferred embodiment, after step S10, step S11 is further included: for the successfully matched target, an improved particle filtering method based on overparameterized feature network is applied to update the particle filter state of the successfully matched target, update the target trajectory, and proceed to S40.
[0067] The multi-target tracking method of this invention, based on overparameterized feature networks and particle filtering, can adapt to complex problems such as dynamic environments, appearance changes, and changes in the number of targets in real-world scenarios, achieving effective and stable multi-target tracking results. This invention's multi-target tracking method is based on detection-based multi-target tracking technology for model design and system construction. It employs deep neural networks to extract target features, performs similarity calculations on corresponding target depth features, and optimizes the feature extraction network using various related methods such as the center loss function. Furthermore, it uses a multi-level matching approach to improve the accuracy of target matching, thereby obtaining a more effective registration model. This invention's multi-target tracking method can significantly improve the registration accuracy of detection results with existing targets during target tracking and trajectory generation in complex scenarios, and has significant application value in fields such as passenger flow analysis, behavior modeling, and autonomous driving.
[0068] The multi-target detection method of this invention, after processing by the detection and target registration module, allows the multi-target tracking model to match the detection results in the current frame with previously confirmed target trajectories. However, matching results alone cannot form a continuous tracking trajectory, nor can they determine the emergence or disappearance of targets. Therefore, the tracking model requires a trajectory generation module to persistently process the registration results, marking the positions of the same target in different frames, ultimately obtaining the tracking trajectory corresponding to a specific target. The random number scheme of this invention uses particle filtering as the main method of the trajectory generation module. However, the registration results may still have certain mismatches or missing matches. If the registration results are directly used as the tracking trajectory, it may lead to deviations in the tracking trajectory, and after long-term error accumulation, negatively impact the performance of the tracking model.
[0069] Therefore, the trajectory generation module needs to optimize and correct the registration results while maintaining the tracking trajectory to obtain a more stable tracking trajectory. Furthermore, in multi-target tracking scenarios, there is also the issue of changing target numbers, i.e., the emergence and disappearance of targets. Some targets may enter or leave the field of view over time, resulting in the same target potentially having multiple states during the tracking task, with transitions between states. The trajectory generation module needs to have a certain ability to handle emerging or disappearing targets. To implement the trajectory generation module and simultaneously ensure the stability of its trajectory and the management of emerging or disappearing targets, this invention introduces depth features to improve the reliability of particle filter weights; target state management is performed in the particle filter to determine emerging and disappearing targets, thereby designing and optimizing the trajectory generation module in the multi-target tracking framework.
[0070] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the multi-target tracking method described above.
[0071] The present invention also provides an electronic device, including a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, it implements the multi-target tracking method described above.
[0072] Although the present invention has been described in detail through the preferred embodiments above, it should be understood that the above description should not be considered as a limitation of the present invention. Various modifications and substitutions to the present invention will be apparent to those skilled in the art after reading the above description. Therefore, the scope of protection of the present invention should be defined by the appended claims.
Claims
1. A multi-target tracking method, characterized in that, The multi-target tracking method includes: S10. Obtain the image information of the current frame, extract the target information to be matched in the current frame, match the target information to be matched with the existing targets extracted from past frames, and obtain the successfully matched targets and the unmatched targets. S20. For targets that fail to match, determine whether they are newly created targets or disappeared targets; S30. For newly formed targets, create a particle filter to describe the newly formed targets; for disappearing targets, delete the existing particle filter describing the disappearing targets. S40, Output trajectory tracking results.
2. The multi-target tracking method as described in claim 1, characterized in that, Step S10 includes: S101. Create separate motion models for each particle. In the formula, The coordinates of the center point of the bounding box (bbox) The movement speed of the bounding box (bbox). The bounding box of the bounding box is defined by its length and width. S102. Maintain the motion state of each particle and update the particle motion state when the next posterior knowledge arrives. The kinematic model of the particle is represented as: In the formula, This represents the state of the system at time k. This represents the system's observation at time k. and Let these represent the system's state transition function and the system's measurement function, respectively. and Let represent the system noise and observation noise at time k, respectively; S103. Based on the observations and motion states from time 1 to k-1, predict the motion state of the system at time k, i.e., from... get Given Based on the relevant system assumptions, the state and If they are mutually independent, we can obtain: For the above formula By performing integration, the corresponding credibility representation is finally obtained: ; S104. Update the system observations at time k, i.e., by... get The process, where the predicted value at time k is known. We can obtain the posterior probability: ,in In practical calculations, it can be taken as a normalization constant. The specific calculation process is as follows: ; S105. A deep appearance feature extraction network for particle filtering is constructed using a deep neural network structure of deep over-parameterized convolutional layer (DO-Conv). The appearance features extracted by the deep appearance feature extraction network are used to calculate the weights of the particles and affect the subsequent importance ranking. S106. Calculate the similarity between particle i and the input observation. 。 3. The multi-target tracking method as described in claim 2, characterized in that, The input image size of the deep appearance feature extraction network is 128×64, and a 128-dimensional feature vector is finally obtained through the setting of the network layer structure.
4. The multi-target tracking method as described in claim 2, characterized in that, In S106, according to formula The similarity between particle i and the input observation is obtained. In the formula The depth appearance feature vector is the input observation. This is the feature vector corresponding to a certain particle in the particle filter system at the current moment.
5. The multi-target tracking method as described in claim 2, characterized in that, Step S20 details: S201, Based on the aforementioned similarity To determine whether the observed values of the particles are reasonable; S202. If the observed value of the particle is unreasonable, the predicted value of the particle is used to perform state transition on the motion state.
6. The multi-target tracking method as described in claim 5, characterized in that, Step S20 further includes S203, setting a target state maintenance model, adding auxiliary variables for each target to record the existence of the target, wherein the target state maintenance model is: In the formula, Let t be the number of states of the tracking system. This represents the state of the j-th target at time t; express The state of existence at time t; where, when When, it means Existence; when When, it means The number of targets at time t does not exist; the number of targets at time t can be determined by... Calculated.
7. The multi-target tracking method as described in claim 6, characterized in that, Step S203 also includes: using Represents all observations of the tracking system up to time t, where In the formula For the i-th observation at time k, Let k be the number of observations at time k, given the known observations. At that time, the state of each target can be inferred using particle filtering.
8. The multi-target tracking method as described in claim 5, characterized in that, In step S201, the judgment index b is calculated, and the judgment index b is obtained according to the following formula: When b is greater than the threshold At that time, the current predicted value is considered reasonable.
9. The multi-target tracking method as described in claim 5, characterized in that, In step S202, the state transition using the predicted value of the particle includes: In the formula Indicates the state of motion of particles. This represents the normalized weight of the particle.
10. The multi-target tracking method as described in claim 1, characterized in that, In step S30, for a newly emerging target, creating a particle filter to describe the newly emerging target includes: determining whether the unmatched targets are continuously present in... If a match is found in a frame of video, it is determined to be a new target, and a particle filter is created to describe the new target.
11. The multi-target tracking method as described in claim 1, characterized in that, In step S30, for a vanishing target, deleting the existing particle filter describing the vanishing target includes: determining whether the unmatched targets are continuously present in... If a match fails in a frame of video, the target is determined to have disappeared, and the particle filter used to describe the disappeared target is deleted.
12. The multi-target tracking method as described in claim 1, characterized in that, After step S10, the process also includes step S11: for a successfully matched target, update the particle filter state of the successfully matched target and proceed to S40.
13. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the multi-target tracking method as described in any one of claims 1-12.
14. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program, which, when executed by the processor, implements the multi-target tracking method as described in any one of claims 1-12.