A small target pedestrian tracking method without skeleton dependence under high-altitude bird's-eye view of a drone

By employing feature confidence assessment and multi-source likelihood fusion, and utilizing motion pattern distinguishing factors to replace skeleton pose information, the problem of unstable appearance features in pedestrian tracking in UAV high-altitude overhead scenarios is solved. This enables efficient tracking of similar pedestrians and improves the robustness and adaptability of the tracking.

CN122289730APending Publication Date: 2026-06-26GUANGDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGDONG UNIV OF TECH
Filing Date
2026-03-17
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing pedestrian tracking solutions in high-altitude drone scenarios suffer from problems such as small target imaging scale, limited appearance feature information, and motion blur, resulting in low accuracy of traditional skeleton detection algorithms and difficulty in effectively tracking similar pedestrians.

Method used

A method combining feature confidence assessment and multi-source likelihood fusion is adopted, using motion pattern discrimination factors to replace skeleton pose information. Through target detection, feature confidence assessment, confidence-weighted multi-source likelihood fusion, and motion pattern discrimination, robust tracking of similar pedestrians with fuzzy small targets is achieved.

Benefits of technology

It effectively solves the problem of unstable appearance features in high-altitude overhead scenarios of UAVs, realizes efficient tracking of similar pedestrians, and improves the robustness and adaptability of tracking.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289730A_ABST
    Figure CN122289730A_ABST
Patent Text Reader

Abstract

This invention discloses a skeletonless, non-skeleton-dependent method for tracking similar small targets in UAV-based high-altitude overhead scenarios. Addressing the challenges of extremely low resolution, motion blur, and highly similar appearances of pedestrian targets in high-altitude aerial photography, which render traditional skeleton pose features ineffective, this method first dynamically calculates feature confidence by jointly evaluating the detection box size, detection confidence, and local sharpness. Based on this, it adaptively adjusts the fusion weights of spatial, appearance, and motion features in multi-source likelihood matching. Furthermore, to handle ambiguity regarding similar targets in congested scenes, it innovatively utilizes sharp displacements in a two-dimensional plane under UAV overhead views to construct motion pattern distinguishing factors (including direction and velocity consistency) to replace ineffective skeleton features and achieve accurate identity resolution. Finally, it combines multi-evidence fusion mechanisms such as spatiotemporal reachability, appearance consistency, and motion continuity to recover trajectories lost due to occlusion. This method effectively overcomes the feature instability and identity ambiguity problems of small targets under UAV high-altitude overhead views, significantly improving the robustness of target tracking in complex traffic scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of target tracking technology, and in particular to a pedestrian tracking method in a drone high-altitude overhead scene. Specifically, it refers to a robust pedestrian tracking method for small target similarity without skeleton dependence based on feature confidence dynamic evaluation, confidence-weighted multi-source likelihood fusion, and motion pattern discrimination factor. Background Technology

[0002] Traffic congestion has always been a core problem restricting urban development. Drones, with their advantages of maneuverability and wide field of view, can conduct real-time monitoring of large areas from a high-altitude perspective, providing new technological pathways for traffic flow monitoring, public safety management, and urban planning. However, existing drone-based pedestrian tracking solutions in high-altitude scenarios have inherent bottlenecks: First, because drones typically take aerial photos at altitudes of tens to hundreds of meters, pedestrians on the ground occupy only 20×20 to 50×50 pixels in the image, resulting in extremely limited visual feature information; second, the attitude changes and body shake of the drone during flight cause motion blur in the aerial images, making the pedestrian target boundaries unclear and significantly reducing the reliability of visual feature extraction. Essentially, this is a fundamental contradiction between "low-resolution images" and "the need for fine features."

[0003] Meanwhile, the similarity problem further exacerbates the tracking difficulty. In traffic flow scenarios, many pedestrians wear similar clothing, and this similarity in appearance is even more pronounced when viewed from a top-down angle. Traditional tracking algorithms often rely on skeleton pose estimation to distinguish pedestrians with similar appearances. Skeleton dependence refers to the technical approach of constructing a skeleton pose model by extracting key points of the human body (such as skeletal nodes such as the head, shoulders, elbows, and knees), and using the unique differences in posture and movement of different pedestrians to distinguish their identities. However, due to the extremely small target scale in the high-altitude top-down scenario of UAVs, existing skeleton detection algorithms have extremely low detection accuracy on such small targets, or even fail completely.

[0004] The present invention aims to solve the above-mentioned technical problems and construct a skeleton-free tracking scheme: deeply combining feature confidence assessment with multi-source evidence fusion, and using motion pattern distinguishing factors to replace the skeleton pose information that cannot be extracted, so as to achieve robust tracking of similar pedestrians with blurred small targets. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide a highly adaptable, intelligent, and robust method for tracking small-target pedestrians from a high-altitude, frameless perspective using unmanned aerial vehicles (UAVs).

[0006] To achieve the above objectives, the present invention adopts the following technical solution:

[0007] A method for tracking small, similar pedestrians without skeleton dependency from a top-down surveillance perspective, characterized by the following steps:

[0008] S1 Target Detection and Feature Confidence Evaluation: Acquire aerial view monitoring images from UAVs and perform target detection in traffic scenes, extracting detection boxes and their detection confidence; Considering the blurriness of aerial images, generate a comprehensive feature confidence based on the detection box size, detection confidence, and local mutual information sharpness using a feature confidence model; extract depth appearance feature vectors from the detection box regions;

[0009] S2 Confidence-weighted multi-source likelihood fusion: To address the appearance instability of fuzzy small targets, spatial likelihood, appearance likelihood, and motion likelihood are dynamically weighted and fused based on the comprehensive feature confidence.

[0010] S3 Motion Pattern Differentiation and Ambiguity Resolution: Based on the association results, identify pairs of targets with extremely similar appearances in the traffic flow, and calculate the motion pattern differentiation factor based on the ground plane projection displacement to resolve ambiguities;

[0011] S4 Multi-evidence Fusion Recovery: Combines spatiotemporal accessibility, appearance consistency and motion continuity to perform evidence fusion recovery of lost targets under traffic congestion or occlusion, and verifies the matching results through a progressive identity verification mechanism;

[0012] S5 Trajectory Output: Outputs the drone monitoring trajectory, including time-synchronized identity sequence, bounding box trajectory, and trajectory reliability assessment.

[0013] Furthermore, the feature confidence model in step S1 aims to quantify the impact of drone aerial photography jitter and low resolution on feature representation; let the input vector x be a three-dimensional column vector:

[0014]

[0015] Where area is the area of ​​the detection box, representing the target imaging scale; To detect confidence, characterizing the detector's degree of certainty about the target's existence; clarity is the local sharpness score, used to characterize the degree of blur caused by drone movement or inaccurate focusing at the telephoto end; feature confidence is calculated as follows:

[0016]

[0017] in , This is the weight matrix. , For bias terms, The activation function is sigmoid; the feature confidence model is pre-trained using samples; the output is the comprehensive feature confidence. This value represents the reliability of the appearance features of the currently detected target for identity recognition; a higher value indicates that the appearance features are more reliable.

[0018] Furthermore, in step S2, for existing trajectories... With new testing Establish the conditional likelihood function:

[0019] Spatial likelihood: This characterizes the spatial matching degree between the newly detected location and the predicted trajectory location; among which For predicting location based on motion models, To predict the uncertainty covariance matrix;

[0020] Appearance similarity: This characterizes the visual similarity between the newly detected pattern and the historical appearance of the trajectory; among which... The feature distance function, For temperature parameters;

[0021] Motion likelihood: This characterizes the degree of consistency between the newly detected motion state and the historical motion pattern of the trajectory;

[0022] The confidence-weighted log-likelihood fusion formula is as follows:

[0023]

[0024] in The basic weights for spatial terms reflect the importance of location information; The basic weight for appearance items; Basic weights for sports events; feature confidence levels Used to modulate the relative contributions of appearance and motion.

[0025] Furthermore, the method for calculating the motion pattern distinguishing factor for resolving similar pedestrian ambiguity in step S3 is as follows:

[0026] Define spatial proximity: For two targets i and j, calculate the normalized center distance. , where h is the height of the detection frame;

[0027] Consistency in direction: This characterizes the degree of agreement between the current direction of motion and the predicted direction.

[0028] Rate consistency: This characterizes the degree of matching between the current rate and the predicted rate;

[0029] Overall sports differentiation score: ,in This is a tradeoff coefficient between direction and speed.

[0030] Furthermore, the determination condition of the progressive identity verification mechanism is as follows: after the initial successful match, the candidate state is entered, and the fusion score is continuously calculated in the subsequent K consecutive frame observations to form a score sequence. ;

[0031] Monotonicity constraint: This characterizes the stability of the matching quality;

[0032] Stability constraints: This characterizes the reliability of the matching quality;

[0033] If the above constraints are met, the identity is confirmed to be restored; if the score drops sharply or a match fails during the candidate period, the candidate status is immediately revoked.

[0034] Furthermore, the specific method for multi-evidence fusion in step S4 for addressing traffic flow congestion and obstruction is as follows:

[0035] Establish a trajectory memory library to store the state vector of trajectories that enter the lost state, including the disappearance time. Last position Movement state Historical average of appearance characteristics and its confidence level ;

[0036] Evidence E1 - Spacetime Accessibility: ;

[0037] Evidence E2 - Appearance Consistency: ;

[0038] Evidence E3 - Continuity of Motion: .

[0039] Furthermore, the evidence fusion employs a log-linear model, adaptively adjusting the evidence weights based on the target attributes:

[0040] set up For size indicator variables, The variable is an ambiguous indicator; the formula for the fusion score is:

[0041]

[0042] in , , These are the basic weights for spatiotemporal, appearance, and motion evidence, respectively. This is the size adjustment factor; This is the ambiguity adjustment coefficient.

[0043] The beneficial effects of this invention are as follows:

[0044] Feature confidence adaptive: The feature confidence model can dynamically evaluate the reliability of appearance features based on target scale, detection confidence and local sharpness, effectively solving the problem of unstable appearance features caused by drone aerial photography shake and low resolution.

[0045] Multi-source fusion intelligent weighting: It adopts a confidence-weighted log-likelihood fusion mechanism to automatically enhance the correlation weight of kinematic constraints when appearance features are unreliable due to ambiguity, thereby achieving adaptive fusion of spatial, appearance and motion features.

[0046] Motion pattern replacement skeleton: Innovative motion pattern distinguishing factor is proposed. By utilizing the physical characteristics of clearly measurable two-dimensional planar displacement under the top view of UAV, it effectively replaces the unextractable fine-grained skeleton attitude information and solves the ambiguity problem of similar-looking targets. Attached Figure Description

[0047] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below.

[0048] Figure 1 This is a diagram illustrating an aerial view of a traffic scene using a drone, as described in an embodiment of the present invention.

[0049] Figure 2 This is a working framework diagram of the small target pedestrian tracking system in an embodiment of the present invention;

[0050] Figure 3 This is a flowchart illustrating the process of the small-target pedestrian tracking method in an embodiment of the present invention. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings.

[0052] like Figure 3 As shown in the figure, this application provides a method for tracking small target pedestrians without frame dependence under high-altitude view from an unmanned aerial vehicle (UAV). The specific steps are as follows:

[0053] S1 Target Detection and Feature Confidence Evaluation:

[0054] like Figure 1 As shown, a high-definition camera mounted on a drone was used to acquire a sequence of overhead images of a traffic scene. The drone autonomously cruised within a cooperative airspace at an altitude of 50-150 meters, and the onboard camera captured panoramic video of the intersection at a frame rate of 25-30 fps, with an image resolution of 1920×1080 or higher.

[0055] The YOLOv8 object detection algorithm was used to detect pedestrians in the acquired image, and the locations of the detection boxes were obtained. and its detection confidence In aerial views from a drone, the typical pedestrian detection bounding box size is 20×20 to 80×80 pixels.

[0056] like Figure 2 As shown, the feature confidence model adopts a two-layer fully connected neural network structure. The input vector x contains three components: (1) : Logarithm of the area of ​​the detection frame; (2) : Confidence level of the detector output; (3) Local sharpness score.

[0057] The dimensions of the model's weight matrix are set as follows: , The model was trained using offline labeled data.

[0058] Depth appearance feature vectors are extracted from the detection bounding box region, and a pre-trained ReID network (such as the ResNet50 model) is used to output 128-dimensional or 256-dimensional feature vectors. .

[0059] S2 confidence-weighted multi-source likelihood fusion:

[0060] like Figure 2 As shown, for each new detection in frame t... and existing trajectory Calculate the spatial likelihood, appearance likelihood, and motion likelihood respectively.

[0061] The core innovation of confidence-weighted fusion lies in using feature confidence. Dynamically adjust the weights of appearance and motion items. In a typical setting, the base weights are... , , .

[0062] S3 Sport Mode Differentiation and Ambiguity Resolution:

[0063] After completing the initial trajectory-detection association, potentially ambiguous target pairs are identified. The criteria for determining ambiguous pairs are: (1) Spatial proximity: (2) Appearance similarity: cosine similarity .

[0064] Combined sports differentiation score .

[0065] S4 Multi-Evidence Fusion Recovery:

[0066] like Figure 2As shown, when a target is temporarily lost due to obstruction or congestion, its state is stored in the trajectory memory.

[0067] Progressive identity verification mechanisms use consecutive K-frames (usually) The stability of the matching results is verified by observation.

[0068] S5 trajectory output:

[0069] The final output of the UAV monitoring trajectory includes: (1) time-synchronized identity sequence; (2) bounding box trajectory; and (3) trajectory reliability assessment.

[0070] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for tracking small, similar pedestrians without skeleton dependency from a top-down monitoring perspective, characterized in that... Includes the following steps: S1: Acquire aerial view monitoring images from the UAV and perform target detection in traffic scenes, extracting detection boxes and their detection confidence scores; considering the blurriness of aerial images, generate a comprehensive feature confidence score based on the detection box size, detection confidence score, and local mutual information clarity using a feature confidence model; extract depth appearance feature vectors from the detection box regions; S2: To address the appearance instability of fuzzy small targets, the spatial likelihood, appearance likelihood, and motion likelihood are dynamically weighted and fused based on the comprehensive feature confidence level to calculate the target association score and complete the preliminary association matching between the trajectory and the detection results. S3: Based on the preliminary association matching results, identify target pairs with extremely similar appearances in the traffic flow, and calculate the directional consistency and velocity consistency based on the ground plane projection displacement as motion mode distinguishing factors to resolve ambiguities; S4: Combine spatiotemporal accessibility, appearance consistency and motion continuity to perform evidence fusion and recovery of lost targets under traffic congestion or obstruction, and verify the matching results through a progressive identity verification mechanism; S5: Outputs the drone monitoring trajectory, including time-synchronized identity sequence, bounding box trajectory, and trajectory reliability assessment.

2. The method according to claim 1, characterized in that: The feature confidence model in step S1 aims to quantify the impact of drone aerial photography jitter and low resolution on feature representation; Let the input vector be , where area is the area of ​​the detection box, representing the target imaging scale; To detect confidence, characterizing the detector's degree of certainty about the existence of the target; clarity is the local sharpness score, used to characterize the degree of blur caused by drone movement or inaccurate focusing at the telephoto end; Feature confidence is calculated as follows: in , This is the weight matrix. , For bias terms, The activation function is sigmoid; the feature confidence model is pre-trained using samples. Output comprehensive feature confidence This value represents the reliability of the appearance features of the currently detected target for identity recognition; a higher value indicates that the appearance features are more reliable.

3. The method according to claim 1, characterized in that: In step S2, for the existing trajectory With new testing Establish the conditional likelihood function: Spatial likelihood: This characterizes the spatial matching degree between the newly detected location and the predicted trajectory location; among which For predicting location based on motion models, To predict the uncertainty covariance matrix; Appearance similarity: This characterizes the visual similarity between the newly detected pattern and the historical appearance of the trajectory. in The feature distance function, For temperature parameters; Motion likelihood: This characterizes the degree of consistency between the newly detected motion state and the historical motion pattern of the trajectory; The confidence-weighted log-likelihood fusion formula is as follows: in The basic weights for spatial terms reflect the importance of location information; The basic weight for appearance items; The basic weights for sports events; Feature confidence The relative contribution used to modulate the appearance and motion terms: when the target captured by the drone exhibits significant motion blur ( When the weight of appearance items approaches 0, the system reduces the weight of appearance items and automatically increases the contribution of kinematic constraints.

4. The method according to claim 1, characterized in that, The method for calculating the motion pattern discrimination factor for dissolving similar pedestrian ambiguities in step S3 is as follows: Define spatial proximity: For two targets i and j, calculate the normalized center distance. ,in To determine the bounding box height, this normalization makes the distance metric independent of the target scale; when If the similarity is less than the nearest neighbor threshold and the cosine similarity of the appearance features is higher than the similarity threshold, it is marked as an ambiguous pair. For ambiguous targets, leveraging the physical characteristic that two-dimensional planar displacement is clearly measurable from the UAV's top-down perspective, a motion pattern distinguishing factor is constructed: Consistency in direction: , which represents the degree of agreement between the current direction of motion and the predicted direction, with a value range of [-1, 1], and a positive value indicates that the directions are consistent; Rate consistency: , which represents the degree of matching between the current rate and the predicted rate, with a value range of [0,1], and a value close to 1 indicates a rate match; Overall sports differentiation score: ,in This is a tradeoff coefficient between direction and speed; When two candidate targets with similar appearances have different motion discrimination scores, the one with the higher score is selected.

5. The method according to claim 1 or 4, characterized in that, In step S4, the determination condition of the progressive identity verification mechanism is as follows: after the initial successful matching, it enters the candidate state, and the fusion score is continuously calculated in the subsequent K consecutive frame observations to form a score sequence. ; Monotonicity constraint: The fluctuation range of the score sequence does not exceed a preset threshold, i.e. This characterizes the stability of the matching quality; Stability constraint: The average fusion score should be higher than the confirmation threshold, i.e. This characterizes the reliability of the matching quality; If the above constraints are met, the identity is confirmed to be restored; if the score drops sharply or the matching fails during the candidate period, the candidate status is immediately revoked, the current detection is initialized as a new trajectory, and the original lost trajectory is retained in the memory to continue waiting for matching.

6. The method according to claim 1, characterized in that: The specific method for multi-evidence fusion in step S4 to address traffic flow congestion and obstruction is as follows: Establish a trajectory memory library to store the state vector of trajectories that enter the lost state, including the disappearance time. Last position Movement state Historical average of appearance characteristics and its confidence level ; For newly emerging mismatch detection With lost track Perform multi-source evidence evaluation: Evidence E1 - Spatiotemporal Accessibility: Calculating Accessibility Probability Based on Kinematic Constraints in The upper bound of physical reach represents the maximum distance a target can move in a given time. This represents the actual displacement distance; Let be the standard normal cumulative distribution function, which maps distance differences to probabilities; For distance uncertainty parameters; Evidence E2 - Appearance Consistency: Characterizing the degree of matching between the new detection and historical appearance, among which The confidence level of historical features is used to reduce the contribution of unreliable historical features caused by aerial image blur. Evidence E3 - Continuity of Motion: It characterizes the degree of consistency between the newly detected position and motion state and the trajectory prediction.

7. The method according to claim 6, characterized in that: Evidence fusion employs a log-linear model, adaptively adjusting evidence weights based on target attributes: set up This is a size indicator variable, which is 1 when the target area is smaller than a preset threshold and 0 otherwise, and is used to identify blurry targets with extremely small scale. This is an ambiguous indicator variable; it is 1 when there are similar-looking neighboring objects to the target, and 0 otherwise. The formula for the fusion score is: in , , These are the basic weights for spatiotemporal, appearance, and motion evidence, respectively. This is a size adjustment coefficient used to reduce the appearance weight and increase the motion weight for blurry small targets from the high-altitude perspective of the UAV. The ambiguity adjustment coefficient is used to further increase the weight of motion evidence when highly similar targets exist. Normalization ensures that the adjusted weights are constant, maintaining consistency in the scoring scale.