Multi-target tracking method and device, electronic equipment and storage medium

By estimating the confidence score of the global motion transformation matrix in a multi-target tracking system and performing adaptive gating, the problems of trajectory drift and identity switching in strong jitter scenarios are solved, thereby improving the stability and continuity of multi-target tracking.

CN122265342APending Publication Date: 2026-06-23GUANGDONG JIECHUANG ROBOT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG JIECHUANG ROBOT CO LTD
Filing Date
2026-03-25
Publication Date
2026-06-23

Smart Images

  • Figure CN122265342A_ABST
    Figure CN122265342A_ABST
Patent Text Reader

Abstract

The application provides a multi-target tracking method and device, electronic equipment and storage medium, and relates to the technical field of computer vision, and the multi-target tracking method comprises the following steps: acquiring a video frame sequence, performing target detection on a current video frame to obtain a detection box set, and performing state prediction on a historical trajectory set to obtain a predicted trajectory set; estimating a current global motion transformation matrix between the current video frame and a previous video frame in the video frame sequence, and extracting estimation process information; calculating a confidence score of the current global motion transformation matrix based on the estimation process information; performing adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; and performing data association and trajectory updating on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain a target association result. The application can improve the stability and continuity of multi-target tracking in a strong jitter inspection scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer vision technology, and in particular to a multi-target tracking method, apparatus, electronic device, and storage medium. Background Technology

[0002] With the development of deep learning-based object detection and appearance feature representation technologies, current multi-object tracking systems generally adopt a "detection-prediction-association" framework: each frame of image is used to detect objects and extract appearance features, motion models such as Kalman filtering are used to predict the target state, and data association is completed by combining IoU (Intersection over Union) and appearance feature distance, thereby outputting a continuous trajectory. To reduce global displacement interference caused by camera motion, some tracking systems further introduce GMC / CMC (Global Motion Compensation / Camera Motion Compensation) modules. These modules estimate the global transformation matrix between adjacent frames through optical flow or feature matching, and perform motion compensation on the trajectory state or detection results accordingly to improve the stability of cross-frame association.

[0003] However, in high-motion scenarios such as robot dog inspections, the camera is usually rigidly connected to the platform. Platform movement is prone to high-frequency jitter, rapid turns, and sudden attitude changes, causing the global motion between adjacent frames to exhibit non-stationary, non-slow characteristics with significant high-frequency components. Simultaneously, inspection environments often involve multiple people interacting, dense targets, and frequent occlusion, resulting in insufficient stable background areas and usable background features, making global motion estimation more susceptible to bias or degradation. Under these conditions, even existing tracking systems integrating GMC / CMC may introduce compensation errors due to unreliable global motion estimation. These errors accumulate and propagate during subsequent prediction, association, and update processes, leading to problems such as trajectory drift, target loss, and increased identity switching, reducing the inspection system's ability to continuously and stably track multiple targets.

[0004] Therefore, improving the stability and continuity of multi-target tracking in high-vibration inspection scenarios has become a pressing technical challenge in this field. Summary of the Invention

[0005] This invention provides a multi-target tracking method, apparatus, electronic device, and storage medium, which can improve the stability and continuity of multi-target tracking in high-vibration inspection scenarios.

[0006] This invention provides a multi-target tracking method, comprising: The video frame sequence is acquired, the target is detected in the current video frame to obtain a set of detection boxes, and the state of the historical trajectory set is predicted to obtain a set of predicted trajectories. Estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extract the estimation process information; Based on the estimation process information, the confidence score of the current global motion transformation matrix is ​​calculated; Based on the confidence score, the current global motion transformation matrix is ​​adaptively gated to generate a target compensation matrix; Based on the confidence score and the target compensation matrix, data association and trajectory updates are performed on the detection box set and the predicted trajectory set to obtain the target association result.

[0007] According to a multi-target tracking method provided by the present invention, the estimation process information includes at least: matching statistics, estimation stability information, and distribution coverage information. The step of calculating the confidence score of the current global motion transformation matrix based on the estimation process information includes: Based on the total number of matching point pairs and the number of inliers in the matching statistics, an inlier consistency index is determined. Based on the degree of difference between the current global motion transformation matrix and the previous global motion transformation matrix in the estimated stability information, a time continuity index is determined; Based on the number of grid divisions and grid occupancy information in the distribution coverage information, the distribution coverage rate index is determined; The credibility score is obtained by weighting and fusing the inlier consistency index, the temporal continuity index, and the distribution coverage index.

[0008] According to a multi-target tracking method provided by the present invention, the step of weightedly fusing the inlier consistency index, the temporal continuity index, and the distribution coverage index to obtain the confidence score includes: The basic fusion value is obtained by weighting and summing the inlier consistency index, the temporal continuity index, and the distribution coverage index. The basic fusion value is processed using a preset function to obtain the confidence score; wherein the preset function is any one of a truncation function, a piecewise function, or a nonlinear mapping function.

[0009] According to a multi-target tracking method provided by the present invention, the step of adaptively gating the current global motion transformation matrix based on the confidence score to generate a target compensation matrix includes: When the confidence score is greater than or equal to the first preset threshold, the current global motion transformation matrix is ​​used as the target compensation matrix; When the confidence score is less than the first preset threshold and greater than the second preset threshold, adaptive attenuation processing is performed on the current global motion transformation matrix to obtain the target compensation matrix; When the confidence score is less than or equal to the second preset threshold, the unit transformation matrix is ​​determined as the target compensation matrix.

[0010] According to a multi-target tracking method provided by the present invention, the step of adaptively attenuating the current global motion transformation matrix to obtain the target compensation matrix includes: The target gating coefficient is determined based on the confidence score, the first preset threshold, and the second preset threshold. Using the target gating coefficient, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate the target compensation matrix.

[0011] According to a multi-target tracking method provided by the present invention, the step of performing data association and trajectory update on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain the target association result includes: The predicted trajectory set is compensated based on the target compensation matrix; Based on the set of detection boxes and the compensated set of predicted trajectories, calculate the geometric distance; Extract the appearance feature vectors corresponding to each detection box in the detection box set to obtain the appearance feature set; Based on the historical appearance feature vectors of each trajectory in the set of appearance features and the set of predicted trajectories, the appearance distance is calculated. Based on the confidence score, adjust the weights of the geometric distance and the appearance distance; The association cost is calculated based on the adjusted weights, the geometric distance, and the appearance distance. Based on the association cost, the detection box set and the predicted trajectory set are matched and calculated to obtain the matching result; The trajectory is updated based on the matching results to determine the target association result.

[0012] According to a multi-target tracking method provided by the present invention, the matching result includes at least one of successfully matched target detection boxes and target predicted trajectories, unmatched predicted trajectories, and unmatched detection boxes. The trajectory update based on the matching result to determine the target association result includes at least one of the following processes: Based on the position of the target detection box, the target predicted trajectory is filtered and updated to obtain the updated target trajectory; and when the confidence score is less than or equal to a preset confidence downgrade threshold, the process noise in the filtering and updating process is increased. The unmatched predicted trajectories are marked as lost, and the lost trajectories are obtained. When the confidence score is less than or equal to the preset confidence downgrade threshold, the retention time of the unmatched predicted trajectories is extended. The unmatched detection boxes are initialized as new trajectories, and when the confidence score is less than or equal to the preset confidence downgrade threshold, the conditions for establishing the new trajectory are improved. The target association result includes at least one of the target trajectory after the state update, the trajectory of the lost state, and the newly generated trajectory.

[0013] The present invention also provides a multi-target tracking device, comprising: The set acquisition module is used to acquire video frame sequences, perform target detection on the current video frame to obtain a set of detection boxes, and perform state prediction on the historical trajectory set to obtain a set of predicted trajectories. The global motion estimation module is used to estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and to extract estimation process information. The credibility assessment module is used to calculate the credibility score of the current global motion transformation matrix based on the estimation process information. The gating processing module is used to perform adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; The association result determination module is used to perform data association and trajectory update on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain the target association result.

[0014] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the multi-target tracking method as described above.

[0015] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the multi-target tracking method as described in any of the preceding claims.

[0016] The multi-target tracking method, apparatus, electronic device, and storage medium provided by this invention first acquire a video frame sequence, perform target detection on the current video frame to obtain a set of detection boxes, and perform state prediction on the historical trajectory set to obtain a set of predicted trajectories. Then, it estimates the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence and extracts the estimation process information. Based on the estimation process information, it calculates the confidence score of the current global motion transformation matrix, realizing dynamic perception and quantification of the global motion estimation quality, overcoming the shortcomings of existing technologies that blindly rely on compensation results in complex environments. Next, it performs adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix. This enables adaptive interception, weakening, or correction of abnormal matrices in high-frequency jitter or attitude change scenarios such as robot dog inspections, blocking the injection of erroneous global displacement interference at its source. Finally, based on the confidence score and the target compensation matrix, data association and trajectory update are performed on the detection box set and the predicted trajectory set to obtain the target association result. The reliability assessment of global motion is deeply integrated into the underlying tracking logic, which suppresses the error injection and cumulative propagation caused by motion compensation failure in strong jitter scenarios, reduces trajectory drift and identity switching, and improves the stability and continuity of multi-target tracking. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0018] Figure 1 This is one of the flowcharts of the multi-target tracking method provided by the present invention.

[0019] Figure 2 This is the second flowchart of the multi-target tracking method provided by the present invention.

[0020] Figure 3 This is the third flowchart of the multi-target tracking method provided by the present invention.

[0021] Figure 4 This is the fourth flowchart of the multi-target tracking method provided by the present invention.

[0022] Figure 5 This is a schematic diagram of the structure of the multi-target tracking device provided by the present invention.

[0023] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0024] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0025] Based on the above analysis, this invention proposes a multi-target tracking method, device, electronic device, and storage medium, which are described below in conjunction with... Figures 1-6 Describe it.

[0026] Figure 1 This is one of the flowcharts illustrating the multi-target tracking method provided by the present invention, such as... Figure 1 As shown, the multi-target tracking method includes steps S110, S120, S130, S140, and S150.

[0027] Step S110: Obtain the video frame sequence, perform target detection on the current video frame to obtain a set of detection boxes, and perform state prediction on the historical trajectory set to obtain a set of predicted trajectories.

[0028] Acquire a continuous sequence of video frames captured by a camera on a mobile platform (e.g., a robotic dog). Use an object detection algorithm to analyze the current video frame. Target detection is performed on the video frame at the current time t to obtain a set of detection boxes: in, For the current video frame The set of detection boxes; For the current video frame The detection bounding box of the kth target in the array; For the current video frame The number of detection boxes in the middle; For the current video frame The bounding box parameters for the k-th target, including the coordinates of the top-left corner. ),Width and high ; To test the confidence level.

[0029] The historical trajectory set, which is the set of trajectories established at the previous time t-1, is denoted as: ; in, Indicates the previous moment The set of trajectories, i.e., the set of historical trajectories; Indicates the previous moment No. The status information of each trajectory includes at least the trajectory ID (identification) and the bounding box status; The number of trajectories is equal to the set. The number of elements contained in it.

[0030] Based on a pre-defined motion model, each trajectory in the historical trajectory set is predicted to obtain a predicted trajectory set. The details are as follows: ; in, Indicates by The set of predicted trajectories at the current time t obtained through motion model prediction; For the first The trajectory at the current moment The predicted state.

[0031] Furthermore, the preset motion model can employ a Kalman filter or an equivalent state-space model, etc.

[0032] Step S120: Estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extract the estimation process information.

[0033] The previous video frame is the video frame at the moment preceding the current video frame in the video frame sequence. For example, the current video frame is the video frame at the current time t. The previous video frame is the video frame at time t-1 in the video frame sequence. .

[0034] In adjacent frames and Estimate the global motion of the camera between frames, and output the data from the previous video frame. Up to the current video frame The global motion transformation matrix, denoted as the current global motion transformation matrix. .

[0035] Furthermore, estimation methods may include, but are not limited to: (1) Sparse optical flow tracking of key points combined with RANSAC (Random Sample Consensus) fitting of affine transformation; (2) Feature point matching combined with RANSAC fitting affine transformation, wherein feature point matching can use algorithms such as ORB (Oriented FAST and Rotated BRIEF) and SIFT (Scale-Invariant Feature Transform); (3) Image registration estimation affine transformation, wherein the ECC (Enhanced Correlation Coefficient) registration algorithm can be used for image registration.

[0036] Furthermore, an affine model is used to represent global motion for any point. have: ; in, The previous video frame The coordinates of the point in the middle; For the current video frame The corresponding point coordinates; a 11 and a 12 These are used to control scaling in the x-axis and y-axis directions, respectively. 21 and a 22 Used to control shearing and rotation respectively, a 11 a 12 a 21 and a 22 These four parameters together determine the linear mapping relationship between video frames in terms of shape and direction; t x and t y These represent the translation amounts in the x-axis and y-axis directions, respectively, used to compensate for camera movement in the horizontal and vertical directions.

[0037] During the global motion estimation of the camera, intermediate information required for confidence assessment is also generated, denoted as estimation process information. .

[0038] Estimation process information Including but not limited to the following information: (1) Matching statistics, including the total number of matched point pairs and the number of interior points; (2) Residual statistics, including the mean, median, quantile of the RANSAC in-point reprojection error, or the variance of the in-point residuals, etc., wherein the variance of the in-point residuals is used to identify degenerate estimates that seem to have many in-points but have large errors.

[0039] (3) Distribution coverage information, which represents the spatial coverage of the matching points (all matching points or only inner points) on the video frame, including the number of grid divisions and grid occupancy information.

[0040] (4) Estimate stability information, including the difference between adjacent frame transformations (i.e. the difference between the global motion transformation matrices of adjacent frames), the fitting success flag, the iterative convergence flag, etc.

[0041] (5) Abnormal transformation constraint information, including translation amplitude, rotation angle, scale change and other information, is used to suppress outlier transformations when there is strong jitter and blur.

[0042] Step S130: Calculate the confidence score of the current global motion transformation matrix based on the estimation process information.

[0043] The extracted estimation process information is quantitatively analyzed to assess whether the current global motion transformation matrix truly reflects the physical displacement of the camera, such as whether it is obscured by a large area of ​​moving crowds or whether the image is blurred due to severe shaking. A normalized confidence score is calculated, with a value between 0 and 1.

[0044] In one embodiment, the confidence score is determined based at least on the inlier consistency index and the time continuity index. Correspondingly, at least two types of information from the estimation process information are selected for confidence assessment; preferably, matching statistics and estimation stability information are selected.

[0045] Specifically, based on the total number of matching point pairs and the number of inliers in the matching statistics, an inlier consistency index is determined; based on the difference between the current global motion transformation matrix and the previous global motion transformation matrix in the estimated stability information, a time continuity index is determined; and the inlier consistency index and the time continuity index are weighted and fused to obtain a credibility score.

[0046] In another implementation, in addition to the in-point consistency index and the time continuity index, any of the following indices can be added to determine the confidence score: optical flow / matching success rate index, spatial distribution rate index (used to characterize the spatial distribution coverage of matching points), residual statistics and anomaly transformation constraint index, etc., to enhance the discrimination capability in strong jitter scenarios.

[0047] Specifically, based on the total number of matching point pairs and the number of inliers in the matching statistics, an inlier consistency index is determined; based on the difference between the current global motion transformation matrix and the previous global motion transformation matrix in the stability estimation information, a temporal continuity index is determined; based on the number of grid divisions and grid occupancy information in the distribution coverage information, a distribution coverage index is determined; and the inlier consistency index, temporal continuity index, and distribution coverage index are weighted and fused to obtain a credibility score.

[0048] The specific execution process can be found in the following examples, which will not be elaborated here.

[0049] Step S140: Based on the confidence score, perform adaptive gating processing on the current global motion transformation matrix to generate the target compensation matrix.

[0050] Adaptive gating is applied to the current global motion transformation matrix based on the confidence score. If the confidence score is high, the current global motion transformation matrix is ​​used in its entirety; if the confidence score is low, the current global motion transformation matrix is ​​used in part or even completely discarded, and finally a target compensation matrix that has been filtered for security is output.

[0051] Step S150: Based on the confidence score and the target compensation matrix, perform data association and trajectory update on the detection box set and the predicted trajectory set to obtain the target association result.

[0052] The target compensation matrix is ​​applied to align the coordinate systems of the predicted trajectory set and the detection box set. Simultaneously, the confidence score is used as a global degradation signal, fed into the data association and trajectory update stages to adjust the matching strategy, ultimately outputting the target association result, i.e., the continuous trajectory of multiple targets.

[0053] Specifically, the predicted trajectory set is compensated based on the target compensation matrix; the geometric distance is calculated based on the detection box set and the compensated predicted trajectory set; the appearance feature vectors corresponding to each detection box in the detection box set are extracted to obtain the appearance feature set; the appearance distance is calculated based on the appearance feature set and the historical appearance feature vectors of each trajectory in the predicted trajectory set; the weights of the geometric distance and the appearance distance are adjusted based on the confidence score; the association cost is calculated based on the adjusted weights, geometric distance, and appearance distance; the matching calculation is performed on the detection box set and the predicted trajectory set based on the association cost to obtain the matching result; and the trajectory is updated based on the matching result to determine the target association result. The specific execution process can be referred to in the following embodiment, which will not be elaborated here.

[0054] The multi-target tracking method provided in this invention first acquires a video frame sequence, performs target detection on the current video frame to obtain a set of detection boxes, and performs state prediction on the historical trajectory set to obtain a set of predicted trajectories. Then, it estimates the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence and extracts the estimation process information. Based on the estimation process information, it calculates the confidence score of the current global motion transformation matrix, achieving dynamic perception and quantification of the global motion estimation quality, overcoming the shortcomings of existing technologies that blindly rely on compensation results in complex environments. Next, it performs adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix. This enables adaptive interception, weakening, or correction of abnormal matrices in high-frequency jitter or attitude change scenarios such as robot dog inspections, blocking the injection of erroneous global displacement interference at its source. Finally, based on the confidence score and the target compensation matrix, data association and trajectory update are performed on the detection box set and the predicted trajectory set to obtain the target association result. The reliability assessment of global motion is deeply integrated into the underlying tracking logic, which suppresses the error injection and cumulative propagation caused by motion compensation failure in strong jitter scenarios, reduces trajectory drift and identity switching, and improves the stability and continuity of multi-target tracking.

[0055] Based on any of the above embodiments, the estimation process information includes at least: matching statistics, estimation stability information, and distribution coverage information. Figure 2 This is the second flowchart of the multi-target tracking method provided by the present invention, as shown below. Figure 2 As shown, step S130 includes: step S131, step S132, step S133 and step S134.

[0056] Step S131: Determine the in-point consistency index based on the total number of matching point pairs and the number of in-points in the matching statistics.

[0057] Substitute the total number of matching point pairs and the number of inliers from the matching statistics into the following formula to calculate the inlier consistency index.

[0058] ; in, This represents the inlier consistency index. Indicates the number of interior points; Indicates the total number of matching point pairs; It is a small constant, preferably . .

[0059] Step S132: Determine the time continuity index based on the difference between the current global motion transformation matrix and the previous global motion transformation matrix in the estimated stability information.

[0060] Because the physical motion of the camera has inertia, the difference between the global motion transformation matrices of two adjacent frames should not result in abrupt changes. The degree of difference between the current and previous global motion transformation matrices can be characterized by the Frobenius norm. To ensure that a larger difference results in a score closer to zero, an exponential decay formula with a negative sign is used to calculate the temporal continuity index. The specific formula is as follows: ; in, It represents the time continuity index, also known as the time smoothness index; exp() is the natural exponential function; It is the Frobenius norm; This is the previous global motion transformation matrix; For scale parameters, The preferred value is 0.5~5. When the size is larger and the exercise is more intense, the size can be increased appropriately. To avoid excessive punishment.

[0061] Step S133: Determine the distribution coverage rate index based on the number of grid divisions and grid occupancy information in the distribution coverage information.

[0062] The current video frame is divided into G grids, meaning the number of grids is G; grid occupancy information. , indicating whether the g-th grid contains a matching point. The matching point can be any matching point or only interior points. The formula for calculating the distribution coverage rate is as follows: ; in, This indicates the distribution coverage rate.

[0063] In one implementation, the current video frame is divided into Each grid, preferred ,Right now High-resolution or wide-field-of-view video frames can be divided into: or Each grid.

[0064] Step S134: The in-point consistency index, the time continuity index, and the distribution coverage index are weighted and fused to obtain the credibility score.

[0065] After calculating the in-point consistency index, temporal continuity index, and distribution coverage index, the in-point consistency index, temporal continuity index, and distribution coverage index are weighted and fused to obtain the credibility score.

[0066] The multi-target tracking method provided in this invention integrates three indicators—matching quality, time dimension, and spatial distribution—to evaluate the reliability of the current global motion transformation matrix, avoiding misjudgments caused by single-dimensional indicators and making the reliability score more scientific and accurate.

[0067] Based on any of the above embodiments, step S134 includes: step S1341 and step S1342.

[0068] Step S1341: The in-point consistency index, the time continuity index, and the distribution coverage index are weighted and summed to obtain the basic fusion value.

[0069] Step S1342: Process the basic fusion value using a preset function to obtain the confidence score; wherein the preset function is any one of a truncation function, a piecewise function, or a nonlinear mapping function.

[0070] First, the in-point consistency index, time continuity index, and distribution coverage index are weighted and summed to obtain the basic fusion value; then, the basic fusion value is processed using a preset function to obtain the credibility score.

[0071] In one implementation, the preset function is a truncation function, which restricts the basic fusion value to a preset confidence range, such as [0,1], to obtain the confidence score. The specific formula is as follows: ; in, Indicates the credibility score; For the truncation function, The numerical limit is w1, w2, and w3 are all weighting coefficients, and preferably, w1 + w2 + w3 = 1. In high-jitter inspection scenarios, a preferred default value is w1 = 0.5, w2 = 0.3, and w3 = 0.2. Furthermore, when the background texture is sparse or the foreground accounts for a high proportion, it is preferable to increase w3 (e.g., increase it to 0.3 to 0.4) to strengthen the coverage constraint.

[0072] In another implementation, the preset function can also be a piecewise function or a nonlinear mapping function. This piecewise or nonlinear mapping function is used to suppress outliers in the base fusion value, resulting in a confidence score, thereby enhancing the ability to suppress anomalies. For example, when using a nonlinear mapping function, a Sigmoid (S-shaped) function or a variant of the Sigmoid can be employed. This effectively suppresses low-quality edge values ​​and amplifies the differences in high-quality values, allowing the system to more decisively reduce confidence when encountering critical states such as ambiguity or jitter, preventing errors from seeping into the multi-target tracking system.

[0073] The multi-target tracking method provided in this invention can further improve the accuracy of the confidence score by processing the basic fusion value obtained by weighted summation of multi-dimensional indicators using a truncation function, a piecewise function, or a nonlinear mapping function.

[0074] Based on any of the above embodiments, step S140 includes: step S141, step S142 and step S143.

[0075] It should be noted that the execution order of steps S141, S142 and S143 is not important.

[0076] Step S141: When the confidence score is greater than or equal to the first preset threshold, the current global motion transformation matrix is ​​used as the target compensation matrix.

[0077] Based on the confidence level, the GMC results are subjected to three-state gating, and a first preset threshold is set. Second preset threshold , Furthermore, the first preset threshold It can be based on scale parameters Configure settings to dynamically adjust the strictness of the gate based on the intensity of the scene.

[0078] In one embodiment, the first preset threshold =0.75, second preset threshold =0.4.

[0079] When the confidence score is greater than or equal to the first preset threshold, it is determined to be a high confidence state, and the current global motion transformation matrix can be used in full, that is, the current global motion transformation matrix is ​​used as the target compensation matrix.

[0080] Step S142: When the confidence score is less than the first preset threshold and greater than the second preset threshold, adaptive attenuation processing is performed on the current global motion transformation matrix to obtain the target compensation matrix.

[0081] When the confidence score is less than the first preset threshold and greater than the second preset threshold, it is determined to be in a medium confidence state, and the current global motion transformation matrix can be partially used. At this time, the current global motion transformation matrix is ​​subjected to adaptive attenuation processing to obtain the target compensation matrix.

[0082] Furthermore, adaptive attenuation processing methods include, but are not limited to: matrix interpolation, compensation scaling, or component selection.

[0083] In one embodiment, the matrix interpolation process is as follows: based on the confidence score, a first preset threshold, and a second preset threshold, a target gating coefficient is determined; using the target gating coefficient, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate a target compensation matrix.

[0084] In another embodiment, the compensation scaling process is as follows: the translation vector or rotation angle in the current global motion transformation matrix is ​​directly scaled proportionally to obtain the target compensation matrix.

[0085] In another embodiment, the component selection (degree-of-freedom reduction) process is as follows: the rotation transformation components and / or scaling transformation components in the current global motion transformation matrix are filtered out, and only the translation transformation components are retained to construct the target compensation matrix.

[0086] Step S143: When the confidence score is less than or equal to the second preset threshold, the unit transformation matrix is ​​determined as the target compensation matrix.

[0087] When the confidence score is less than or equal to the second preset threshold, it is determined to be a low confidence state, and the current global motion transformation matrix is ​​disabled. At this time, the unit transformation matrix is ​​directly determined as the target compensation matrix.

[0088] The multi-target tracking method provided in this invention performs adaptive gating processing on motion compensation based on the relationship between the confidence score and the first preset threshold and the second preset threshold, realizing full, partial or disabled adaptive compensation. This allows the system to use full compensation when the compensation is reliable to improve the correlation stability, and to reduce the compensation intensity or even disable compensation when the compensation is unreliable, thereby suppressing error injection and cumulative propagation.

[0089] Based on any of the above embodiments Figure 3 This is the third flowchart of the multi-target tracking method provided by the present invention, as shown below. Figure 3 As shown, step S143 includes step S1431 and step S1432.

[0090] Step S1431: Determine the target gating coefficient based on the confidence score, the first preset threshold, and the second preset threshold.

[0091] Predefined gating coefficients ,in, Indicates the gating coefficient. This represents the gating coefficient function. It should be noted that the gating coefficient... Monotonic The higher the confidence score, the larger the calculated gating coefficient.

[0092] Substituting the confidence score into the gating coefficient function above, we obtain the target gating coefficient.

[0093] In one implementation, the gating coefficient function is as follows: .

[0094] Step S1432: Using the target gating coefficient, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate the target compensation matrix.

[0095] The adaptive compensation matrix is ​​pre-constructed as follows: ; in, Represents the target compensation matrix; This is the current global motion transformation matrix, that is, the global motion transformation matrix at the current time t; The unit transformation matrix, and Same dimension.

[0096] Substitute the target gating coefficient into the above In the formula for determining the target compensation matrix, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate the target compensation matrix.

[0097] Furthermore, the target compensation matrix Geometric calculations are preferably used in the data association phase to achieve coordinate alignment. It can also be used to compensate for the predicted trajectory set or the detection box set before performing association.

[0098] The multi-target tracking method provided in this invention determines the target compensation matrix through interpolation, avoiding abrupt changes in compensation amount caused by a one-size-fits-all threshold, thereby ensuring the spatial smoothness of the predicted trajectory.

[0099] Based on any of the above embodiments Figure 4 This is the fourth flowchart of the multi-target tracking method provided by the present invention, as shown below. Figure 4 As shown, step S150 includes: step S151, step S152, step S153, step S154, step S155, step S156, step S157 and step S158.

[0100] Step S151: Based on the target compensation matrix, compensate the predicted trajectory set.

[0101] Step S152: Calculate the geometric distance based on the set of detection boxes and the compensated set of predicted trajectories.

[0102] After determining the target compensation matrix, the predicted trajectory set is compensated based on the target compensation matrix to obtain the compensated predicted trajectory set.

[0103] It should be noted that the predicted trajectory set includes, but is not limited to: target ID, trajectory ID, predicted bounding box position and predicted velocity of the current video frame, and historical appearance features. The target compensation matrix compensates for geometric position deviations caused by global camera motion; therefore, the compensation operation only applies to the spatial position state of each trajectory in the predicted trajectory set, such as predicted bounding box position and predicted velocity.

[0104] Then, based on the positions of the detection boxes in the detection box set and the predicted boxes in the compensated predicted trajectory set, the geometric distance is calculated. The smaller the geometric distance, the closer the spatial locations.

[0105] In one implementation, geometric distance can be characterized by calculating the Intersection over Union (IoU) or by using Mahalanobis distance.

[0106] Step S153: Extract the appearance feature vectors corresponding to each detection box in the detection box set to obtain the appearance feature set.

[0107] For each bounding box in the bounding box set, extract the appearance feature vector, as follows: ; in, Let be the set of appearance features corresponding to each detection box at the current time t. For the current moment The number of detection boxes in the video frame. That is, the number of detection boxes equals the set of detection boxes The number of elements contained therein; For the first The appearance feature vector of each detected target , where R represents the set of real numbers and d represents the dimension.

[0108] Step S154: Calculate the appearance distance based on the historical appearance feature vectors of each trajectory in the appearance feature set and the predicted trajectory set.

[0109] It should be noted that the historical appearance feature vectors stored in each trajectory in the predicted trajectory set represent the visual semantic essence of the target, such as color and texture, and are not affected by the global geometric motion of the camera. Therefore, when calculating the appearance distance, the original historical appearance feature vectors in the predicted trajectory set can be directly extracted.

[0110] The appearance distance is calculated by comparing the distance between each appearance feature vector in the appearance feature set and the historical appearance feature vector of the corresponding trajectory in the predicted trajectory set. The smaller the appearance distance, the more similar the appearance features.

[0111] In one implementation, appearance distance can be characterized by calculating cosine similarity.

[0112] Step S155: Based on the confidence score, adjust the weights of the geometric distance and the appearance distance.

[0113] The weights of geometric distance and appearance distance are adjusted based on the credibility score.

[0114] In one embodiment, when the confidence score is less than a second preset threshold, i.e., when the confidence score is in a low confidence state, the weight of geometric distance is reduced, while the weight of appearance distance is increased.

[0115] In another embodiment, when the confidence score is greater than a first preset threshold, i.e., when the confidence score is in a high confidence state, the weight of geometric distance is increased, while the weight of appearance distance is decreased.

[0116] By making the above adjustments, the matching stability under crowded, occluded, and strong jitter conditions can be improved, and false associations can be reduced.

[0117] Step S156: Calculate the association cost based on the adjusted weights, the geometric distance, and the appearance distance.

[0118] The adjusted weights include: the adjusted weights for geometric distance and the adjusted weights for appearance distance.

[0119] The association cost is calculated based on the adjusted weights, geometric distance, and appearance distance. The specific calculation formula is as follows: ; in, Indicates the first Predicted trajectory With the Detection box The associated costs; Indicates the first Predicted trajectory With the Detection box geometric distance; Indicates the first Predicted trajectory With the The appearance feature vector corresponding to each detection box The visual distance between them; The weight of the geometric distance as a function of credibility is represented, i.e., the weight of the geometric distance adjusted based on the credibility score. This represents the weight of the appearance distance as it changes with credibility, i.e., the weight of the appearance distance adjusted based on the credibility score.

[0120] Step S157: Based on the association cost, perform matching calculations on the detection box set and the predicted trajectory set to obtain the matching result.

[0121] Based on the calculated association cost, a matching calculation is performed on the set of detection boxes and the set of predicted trajectories to obtain the matching result.

[0122] Furthermore, when performing matching calculations, the Hungarian Algorithm or the KM (Kuhn-Munkres) algorithm can be used to perform bipartite graph optimal matching on the association cost to obtain the matching results. The matching results include successfully matched target detection boxes and target predicted trajectories, unmatched predicted trajectories, and unmatched detection boxes.

[0123] Step S158: Based on the matching result, perform trajectory update to determine the target association result.

[0124] In one embodiment, based on the position of the target detection box, the target predicted trajectory is filtered and updated to obtain the target trajectory after the state update; unmatched predicted trajectories are marked as lost states to obtain the trajectory in the lost state; and unmatched detection boxes are initialized as new trajectories; wherein, the target association result includes at least one of the target trajectory after the state update, the trajectory in the lost state, and the new trajectory.

[0125] In another embodiment, when the confidence score is greater than a preset confidence downgrade threshold, the target predicted trajectory is updated with a filtered state based on the position of the target detection box to obtain the updated target trajectory. Further, when the confidence score is less than or equal to the preset confidence downgrade threshold, the process noise during the filter state update process is increased when updating the target predicted trajectory with a filtered state based on the position of the target detection box. When the confidence score is greater than the preset confidence downgrade threshold, unmatched predicted trajectories are marked as lost trajectories to obtain lost trajectories. Further, when the confidence score is less than or equal to the preset confidence downgrade threshold, unmatched predicted trajectories are marked as lost trajectories to obtain lost trajectories, and the retention time of unmatched predicted trajectories is extended. When the confidence score is greater than the preset confidence downgrade threshold, unmatched detection boxes are initialized as new trajectories. Further, when the confidence score is less than or equal to the preset confidence downgrade threshold, unmatched detection boxes are initialized as new trajectories, and the conditions for establishing new trajectories are improved. The target association result includes at least one of the updated target trajectory, the lost trajectory, and the new trajectory.

[0126] Furthermore, the target association results can also include the target identity identifier.

[0127] The multi-target tracking method provided in this invention dynamically adjusts the composition and weights of the data association cost based on the confidence score. Based on the adjusted weights, geometric distance, and appearance distance, the association cost is calculated for matching calculations to determine the target association result. This method achieves dynamic and flexible fusion of multimodal features. In scenarios with severe shaking, when the target's position changes abruptly, relying solely on position matching will lead to target tracking failure. However, the algorithm in this invention automatically shifts the trust anchor point to appearance features, greatly improving the multi-target tracking capability in complex dynamic scenarios.

[0128] Based on any of the above embodiments, the matching result includes a successfully matched target detection box and target prediction trajectory, an unmatched prediction trajectory and an unmatched detection box, and the step of determining the target association result based on the matching result further includes the following steps S1581, S1582 and / or S1583.

[0129] Step S1581: Based on the position of the target detection box, perform a filter state update on the target prediction trajectory; when the confidence score is less than or equal to a preset confidence downgrade threshold, increase the process noise during the filter state update process.

[0130] The preset credibility degradation threshold can be set to the same threshold as the second preset threshold mentioned above, or it can be set to a different threshold than the second preset threshold mentioned above.

[0131] When the confidence score is greater than the preset confidence downgrade threshold, the predicted trajectory of the target is updated with a filtered state based on the position of the target detection box. When the confidence score is less than or equal to the preset confidence downgrade threshold, the predicted trajectory of the target is updated with a filtered state based on the position of the target detection box, while increasing the process noise during the filtered state update process. It should be understood that, in addition to increasing the process noise, methods such as reducing the update intensity or freezing some state components can also be used.

[0132] The above method effectively prevents the target's velocity and acceleration state variables from being skewed by erroneous detection frames with severe displacement errors, ensuring the smoothness and stability of the filter state.

[0133] Step S1582: Mark the unmatched predicted trajectory as lost, and extend the retention time of the unmatched predicted trajectory when the confidence score is less than or equal to the preset confidence downgrade threshold.

[0134] When the confidence score is greater than the preset confidence downgrade threshold, the unmatched predicted trajectory is marked as lost; when the confidence score is less than or equal to the preset confidence downgrade threshold, the unmatched predicted trajectory is marked as lost, and the retention time of the unmatched predicted trajectory is extended. For example, instead of the usual deletion after 30 frames, the retention time is extended to 60 or even 90 frames.

[0135] By using the above methods, frequent track breaks and ID switching caused by short-term jitter can be avoided.

[0136] Step S1583: Initialize the unmatched detection box as a new trajectory, and when the confidence score is less than or equal to the preset confidence downgrade threshold, increase the establishment conditions of the new trajectory.

[0137] When the confidence score is greater than the preset confidence downgrade threshold, the unmatched detection box is initialized as a new trajectory; when the confidence score is less than or equal to the preset confidence downgrade threshold, the unmatched detection box is initialized as a new trajectory, and the conditions for establishing a new trajectory are increased. For example, normally, three consecutive successful matches are sufficient to establish a new target, but now it is required that it must appear continuously and stably for 10 frames.

[0138] The above methods effectively filter out momentary false targets caused by environmental vibrations.

[0139] The multi-target tracking device provided in the embodiments of the present invention is described below. The multi-target tracking device described below can be referred to in correspondence with the multi-target tracking method described above.

[0140] Figure 5 This is a schematic diagram of the structure of the multi-target tracking device provided by the present invention, as shown below. Figure 5 As shown, the device includes a set acquisition module 510, a global motion estimation module 520, a confidence assessment module 530, a gating processing module 540, and an association result determination module 550; wherein: The set acquisition module 510 is used to acquire video frame sequences, perform target detection on the current video frame to obtain a set of detection boxes, and perform state prediction on the historical trajectory set to obtain a set of predicted trajectories. The global motion estimation module 520 is used to estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extract estimation process information; The credibility assessment module 530 is used to calculate the credibility score of the current global motion transformation matrix based on the estimation process information. Gating processing module 540 is used to perform adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; The association result determination module 550 is used to perform data association and trajectory update on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain the target association result.

[0141] The multi-target tracking device provided in this invention first acquires a video frame sequence, performs target detection on the current video frame to obtain a set of detection boxes, and performs state prediction on the historical trajectory set to obtain a set of predicted trajectories. Then, it estimates the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence and extracts the estimation process information. Based on the estimation process information, it calculates the confidence score of the current global motion transformation matrix, realizing dynamic perception and quantification of the global motion estimation quality, overcoming the shortcomings of existing technologies that blindly rely on compensation results in complex environments. Next, it performs adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix. This enables adaptive interception, weakening, or correction of abnormal matrices in high-frequency jitter or attitude change scenarios such as robot dog inspections, blocking the injection of erroneous global displacement interference at its source. Finally, based on the confidence score and the target compensation matrix, data association and trajectory update are performed on the detection box set and the predicted trajectory set to obtain the target association result. The reliability assessment of global motion is deeply integrated into the underlying tracking logic, which suppresses the error injection and cumulative propagation caused by motion compensation failure in strong jitter scenarios, reduces trajectory drift and identity switching, and improves the stability and continuity of multi-target tracking.

[0142] According to a multi-target tracking device provided by the present invention, the estimation process information includes at least: matching statistics, estimation stability information, and distribution coverage information; the reliability evaluation module 530 includes: The first determining unit is used to determine the in-point consistency index based on the total number of matching point pairs and the number of in-points in the matching statistics. The second determining unit is used to determine the time continuity index based on the degree of difference between the current global motion transformation matrix and the previous global motion transformation matrix in the estimated stability information. The third determining unit is used to determine the distribution coverage rate index based on the number of grid divisions and grid occupancy information in the distribution coverage information; The credibility assessment unit is used to weight and fuse the inlier consistency index, the time continuity index, and the distribution coverage index to obtain the credibility score.

[0143] According to a multi-target tracking device provided by the present invention, the reliability evaluation unit is specifically used for: The basic fusion value is obtained by weighting and summing the inlier consistency index, the temporal continuity index, and the distribution coverage index. The basic fusion value is processed using a preset function to obtain the confidence score; wherein the preset function is any one of a truncation function, a piecewise function, or a nonlinear mapping function.

[0144] According to a multi-target tracking device provided by the present invention, the gating processing module 540 includes: The first processing unit is used to use the current global motion transformation matrix as the target compensation matrix when the confidence score is greater than or equal to a first preset threshold. The second processing unit is used to perform adaptive attenuation processing on the current global motion transformation matrix to obtain the target compensation matrix when the confidence score is less than the first preset threshold and greater than the second preset threshold. The third processing unit is used to determine the unit transformation matrix as the target compensation matrix when the confidence score is less than or equal to the second preset threshold.

[0145] According to a multi-target tracking device provided by the present invention, the second processing unit is specifically used for: The target gating coefficient is determined based on the confidence score, the first preset threshold, and the second preset threshold. Using the target gating coefficient, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate the target compensation matrix.

[0146] According to a multi-target tracking device provided by the present invention, the correlation result determination module 550 includes: The compensation unit is used to compensate the predicted trajectory set based on the target compensation matrix; The first calculation unit is used to calculate the geometric distance based on the set of detection boxes and the compensated set of predicted trajectories; An extraction unit is used to extract the appearance feature vectors corresponding to each detection box in the detection box set to obtain an appearance feature set. The second calculation unit is used to calculate the appearance distance based on the historical appearance feature vectors of each trajectory in the appearance feature set and the predicted trajectory set; A weight adjustment unit is used to adjust the weights of the geometric distance and the appearance distance based on the confidence score; The third calculation unit is used to calculate the association cost based on the adjusted weights, the geometric distance, and the appearance distance; The matching calculation unit is used to perform matching calculations on the detection box set and the predicted trajectory set based on the association cost to obtain the matching result; The fourth determining unit is used to update the trajectory based on the matching result in order to determine the target association result.

[0147] According to a multi-target tracking device provided by the present invention, the matching result includes at least one of successfully matched target detection boxes and target predicted trajectories, unmatched predicted trajectories, and unmatched detection boxes. The fourth determining unit is specifically used for at least one of the following processes: Based on the position of the target detection box, the target predicted trajectory is filtered and updated to obtain the updated target trajectory; and when the confidence score is less than or equal to a preset confidence downgrade threshold, the process noise in the filtering and updating process is increased. The unmatched predicted trajectories are marked as lost, and the lost trajectories are obtained. When the confidence score is less than or equal to the preset confidence downgrade threshold, the retention time of the unmatched predicted trajectories is extended. The unmatched detection boxes are initialized as new trajectories, and when the confidence score is less than or equal to the preset confidence downgrade threshold, the conditions for establishing the new trajectory are improved. The target association result includes at least one of the target trajectory after the state update, the trajectory of the lost state, and the newly generated trajectory.

[0148] It should be noted that the multi-target tracking device provided in this embodiment of the invention can implement all the method steps implemented in the multi-target tracking method embodiment and achieve the same technical effect. Therefore, the parts and beneficial effects that are the same as those in the method embodiment will not be described in detail here.

[0149] Figure 6An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6 As shown, the electronic device may include a processor 610, a communications interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communications interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute the multi-target tracking method provided in the above embodiments. The method includes: acquiring a video frame sequence; performing target detection on the current video frame to obtain a set of detection boxes; and performing state prediction on a set of historical trajectories to obtain a set of predicted trajectories; estimating the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extracting estimation process information; calculating a confidence score of the current global motion transformation matrix based on the estimation process information; performing adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; and performing data association and trajectory update on the set of detection boxes and the set of predicted trajectories based on the confidence score and the target compensation matrix to obtain a target association result.

[0150] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0151] On the other hand, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the multi-target tracking method provided in the above embodiments. The method includes: acquiring a video frame sequence; performing target detection on the current video frame to obtain a set of detection boxes; and performing state prediction on a set of historical trajectories to obtain a set of predicted trajectories; estimating the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extracting estimation process information; calculating a confidence score of the current global motion transformation matrix based on the estimation process information; performing adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; and performing data association and trajectory update on the set of detection boxes and the set of predicted trajectories based on the confidence score and the target compensation matrix to obtain a target association result.

[0152] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0153] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus the necessary high-resource hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0154] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-target tracking method, characterized in that, include: The video frame sequence is acquired, the target is detected in the current video frame to obtain a set of detection boxes, and the state of the historical trajectory set is predicted to obtain a set of predicted trajectories. Estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and extract the estimation process information; Based on the estimation process information, the confidence score of the current global motion transformation matrix is ​​calculated; Based on the confidence score, the current global motion transformation matrix is ​​adaptively gated to generate a target compensation matrix; Based on the confidence score and the target compensation matrix, data association and trajectory updates are performed on the detection box set and the predicted trajectory set to obtain the target association result.

2. The multi-target tracking method according to claim 1, characterized in that, The estimation process information includes at least: matching statistics, estimation stability information, and distribution coverage information. The calculation of the confidence score of the current global motion transformation matrix based on the estimation process information includes: Based on the total number of matching point pairs and the number of inliers in the matching statistics, an inlier consistency index is determined. Based on the degree of difference between the current global motion transformation matrix and the previous global motion transformation matrix in the estimated stability information, a time continuity index is determined; Based on the number of grid divisions and grid occupancy information in the distribution coverage information, the distribution coverage rate index is determined; The credibility score is obtained by weighting and fusing the inlier consistency index, the temporal continuity index, and the distribution coverage index.

3. The multi-target tracking method according to claim 2, characterized in that, The step of weighting and fusing the inlier consistency index, the temporal continuity index, and the distribution coverage index to obtain the credibility score includes: The basic fusion value is obtained by weighting and summing the inlier consistency index, the temporal continuity index, and the distribution coverage index. The basic fusion value is processed using a preset function to obtain the confidence score; wherein the preset function is any one of a truncation function, a piecewise function, or a nonlinear mapping function.

4. The multi-target tracking method according to claim 1, characterized in that, The step of adaptively gating the current global motion transformation matrix based on the confidence score to generate the target compensation matrix includes: When the confidence score is greater than or equal to the first preset threshold, the current global motion transformation matrix is ​​used as the target compensation matrix; When the confidence score is less than the first preset threshold and greater than the second preset threshold, adaptive attenuation processing is performed on the current global motion transformation matrix to obtain the target compensation matrix; When the confidence score is less than or equal to the second preset threshold, the unit transformation matrix is ​​determined as the target compensation matrix.

5. The multi-target tracking method according to claim 4, characterized in that, The adaptive attenuation processing of the current global motion transformation matrix to obtain the target compensation matrix includes: The target gating coefficient is determined based on the confidence score, the first preset threshold, and the second preset threshold. Using the target gating coefficient, interpolation is performed between the current global motion transformation matrix and the unit transformation matrix to generate the target compensation matrix.

6. The multi-target tracking method according to any one of claims 1 to 5, characterized in that, The step of performing data association and trajectory update on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain the target association result includes: The predicted trajectory set is compensated based on the target compensation matrix; Based on the set of detection boxes and the compensated set of predicted trajectories, calculate the geometric distance; Extract the appearance feature vectors corresponding to each detection box in the detection box set to obtain the appearance feature set; Based on the historical appearance feature vectors of each trajectory in the set of appearance features and the set of predicted trajectories, the appearance distance is calculated. Based on the confidence score, adjust the weights of the geometric distance and the appearance distance; The association cost is calculated based on the adjusted weights, the geometric distance, and the appearance distance. Based on the association cost, the detection box set and the predicted trajectory set are matched and calculated to obtain the matching result; The trajectory is updated based on the matching results to determine the target association result.

7. The multi-target tracking method according to claim 6, characterized in that, The matching result includes at least one of the following: a successfully matched target detection box and a predicted target trajectory, an unmatched predicted trajectory, and an unmatched detection box. The trajectory update based on the matching result to determine the target association result includes at least one of the following processes: Based on the position of the target detection box, the target predicted trajectory is filtered and updated to obtain the updated target trajectory; and when the confidence score is less than or equal to a preset confidence downgrade threshold, the process noise in the filtering and updating process is increased. The unmatched predicted trajectories are marked as lost, and the lost trajectories are obtained. When the confidence score is less than or equal to the preset confidence downgrade threshold, the retention time of the unmatched predicted trajectories is extended. The unmatched detection boxes are initialized as new trajectories, and when the confidence score is less than or equal to the preset confidence downgrade threshold, the conditions for establishing the new trajectory are improved. The target association result includes at least one of the target trajectory after the state update, the trajectory of the lost state, and the newly generated trajectory.

8. A multi-target tracking device, characterized in that, include: The set acquisition module is used to acquire video frame sequences, perform target detection on the current video frame to obtain a set of detection boxes, and perform state prediction on the historical trajectory set to obtain a set of predicted trajectories. The global motion estimation module is used to estimate the current global motion transformation matrix between the current video frame and the previous video frame in the video frame sequence, and to extract estimation process information. The credibility assessment module is used to calculate the credibility score of the current global motion transformation matrix based on the estimation process information. The gating processing module is used to perform adaptive gating processing on the current global motion transformation matrix based on the confidence score to generate a target compensation matrix; The association result determination module is used to perform data association and trajectory update on the detection box set and the predicted trajectory set based on the confidence score and the target compensation matrix to obtain the target association result.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the multi-target tracking method as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the multi-target tracking method as described in any one of claims 1 to 7.