An unmanned aerial vehicle cluster cooperative AI visual multi-target tracking method and system

By analyzing multi-view template images and spatial location data, and combining motion trajectory convergence, a multi-target tracking cost function is constructed, which solves the problem of difficulty in distinguishing similar-looking targets in UAV swarms, and realizes accurate collaborative tracking and stable broadcasting of UAV swarms.

CN122289321APending Publication Date: 2026-06-26ZHEJIANG YUNDUOWANG TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG YUNDUOWANG TECH CO LTD
Filing Date
2026-05-18
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In collaborative multi-target tracking of drone swarms, existing technologies ignore local details and make it difficult to distinguish between participating drones that look very similar. This can lead to target loss or misidentification during tracking, affecting the accuracy and fairness of the broadcast.

Method used

By using multi-view template image data and combining regional feature contribution analysis to calculate tracking saliency, and combining spatial location data and trajectory convergence, a multi-target tracking cost function is constructed to achieve collaborative tracking of UAV swarms.

Benefits of technology

It effectively distinguishes between drone targets with similar appearances, reduces the probability of target confusion and misidentification, improves the robustness and environmental adaptability of tracking, and ensures the stability and accuracy of event video broadcasts.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289321A_ABST
    Figure CN122289321A_ABST
Patent Text Reader

Abstract

This invention relates to the field of multi-target tracking technology, and more particularly to an AI-based visual multi-target tracking method and system for collaborative drone swarm tracking. The method includes: acquiring multi-view template images of the target to be tracked, real-time video streams of the drone swarm, and spatial location data; calculating the tracking saliency from different shooting directions, and determining the tracking weight based on the relative spatial angles; extracting real-time image features to calculate appearance matching degree and obtain tracking affiliation degree, and calculating trajectory convergence; constructing a cost function by combining tracking affiliation degree and trajectory convergence, and determining the optimal matching scheme by minimizing the cost function to control collaborative tracking by the drone swarm. This invention effectively solves the problem of identifying and distinguishing highly similar drones by relying on local feature analysis of multi-view images and multi-dimensional fusion evaluation, combined with a globally optimal matching strategy. It enhances tracking stability and accuracy in complex scenarios, avoids tracking anomalies, and ensures orderly and reliable broadcasting of events.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of multi-target tracking technology. In particular, it relates to an AI-based visual multi-target tracking method and system for collaborative drone swarming. Background Technology

[0002] Among the many tasks in drone swarm collaboration, AI-powered multi-object tracking (MOT) is a core component for achieving intelligent operations. It requires the system not only to detect multiple targets in the frame but also to maintain a unique identifier for each target and continuously predict its position throughout the video sequence. This technology is crucial for drone swarms to perform tasks such as collaborative reconnaissance, target acquisition, and multi-camera intelligent broadcasting of large-scale events. Accurate visual tracking is the foundation for subsequent swarm decisions and actions, and its performance directly determines the success or failure of the entire collaborative system. Especially in high-speed, high-dynamic scenarios, such as intelligent broadcasting of drone racing competitions, extremely high demands are placed on the real-time performance, accuracy, and robustness of tracking algorithms.

[0003] In multi-target tracking applications involving drone swarms, especially in specific scenarios such as competitive broadcasting, the drones participating in the flight field typically have highly similar appearances and paint schemes, resulting in minimal visual differences between targets at the visual perception level. This high degree of similarity makes it extremely easy for drone swarms to confuse targets during multi-target tracking tasks. Tracking algorithms struggle to accurately distinguish and lock onto the designated target drone, leading to deviations in the broadcast footage, with the camera mistakenly following non-target drones. This can cause serious disputes in the competition, affecting its fairness and entertainment value.

[0004] Existing technologies typically employ analysis methods based on the overall features of grayscale images. However, their fundamental limitation lies in ignoring the crucial scene feature of highly similar appearances among participating drones. When the overall grayscale distribution of multiple targets is similar, they cannot be effectively distinguished. This coarse-grained analysis method, which ignores local detail differences, directly leads to frequent instances of lost or misidentified images of the tracked object during actual tracking, resulting in broadcast errors. Summary of the Invention

[0005] To address the problem that existing technologies, which employ coarse-grained global feature analysis methods that ignore local details, struggle to effectively distinguish between participating drones with highly similar appearances, leading to target loss or misidentification during tracking and causing broadcast errors, this invention provides solutions in the following aspects.

[0006] In the first aspect, an AI-based visual multi-target tracking method for collaborative drone swarm tracking includes: acquiring multi-view template image data of the target to be tracked, as well as real-time video stream data and spatial location data collected by the drone swarm during flight; calculating the tracking salience of the target under different shooting directions based on the multi-view template image data using a regional feature contribution analysis method; the tracking salience is used to characterize the identification contribution weight of local regions of the target image in distinguishing different targets to be tracked; calculating the relative spatial angle between the shooting drone and the target to be tracked based on the spatial location data, and determining the tracking weight under the current shooting viewpoint by combining the tracking salience; extracting features from the real-time video stream data, calculating the appearance matching degree between the real-time image features and the multi-view template image data, and calculating the tracking affiliation degree by combining the tracking weight; calculating the trajectory convergence between the shooting drone and the target to be tracked based on the time series of the spatial location data; constructing a multi-target tracking cost function by combining the tracking affiliation degree and the trajectory convergence, determining the optimal tracking matching scheme by minimizing the cost function, and controlling the drone swarm to collaboratively track the target to be tracked.

[0007] Preferably, the step of calculating the tracking saliency of the target under different shooting directions includes: The template image of the target to be tracked is divided into a whole region and several sub-regions; Calculate the difference in the first feature distribution between the target to be tracked and the interfering target in the overall region; Calculate the difference in the second feature distribution between the target to be tracked and the interfering target in the remaining region after removing any sub-region; The tracking significance of the sub-region is determined based on the difference between the first feature distribution difference value and the second feature distribution difference value.

[0008] Preferably, the step of calculating the appearance matching degree between real-time image features and the multi-view template image data includes: Edge detection operators are used to extract edge pixel feature sets from real-time video stream images and template images, respectively. Construct a similarity matrix between the edge features of the real-time image and the edge features of the template image; The optimal assignment algorithm is used to find the maximum matching combination in the similarity matrix, and the cumulative value or mean of the local feature similarity in the maximum matching combination is calculated as the appearance matching degree.

[0009] Preferably, the method for calculating the convergence of the motion trajectory is as follows: Differential processing is performed on the historical spatial location data of the drone and the target to be tracked to obtain velocity or acceleration time series; A time-series alignment algorithm is used to calculate the nonlinear distance between the drone and the target to be tracked in the time series, and the nonlinear distance is used as the convergence of the motion trajectory.

[0010] Preferably, the step of determining the tracking weights under the current shooting angle includes: Calculate the cumulative deviation of the angle between the line connecting the drone and the target to be tracked in each axis of the spatial coordinate system and the angle between the line and the preset standard shooting direction; The accumulated deviation value is multiplied or weighted and fused with the tracking significance of the corresponding shooting direction to obtain the tracking weight.

[0011] Preferably, the step of constructing the multi-target tracking cost function includes: Calculate the product or weighted sum of the tracking affiliation degree and the trajectory convergence to obtain the tracking score for each pair of drones and the target to be tracked; Iterate through all preset tracking allocation schemes and select the scheme with the smallest sum of tracking scores as the optimal tracking matching scheme.

[0012] Preferably, the multi-view template image data is a grayscale template image of the target to be tracked under six preset standard shooting directions: up, down, left, right, front, and back; the real-time video stream data is processed by continuous frame extraction and grayscale conversion to obtain a flight grayscale image sequence for tracking.

[0013] Secondly, an AI visual multi-target tracking system for drone swarm collaboration includes a processor and a memory, wherein the memory stores computer program instructions, and when the computer program instructions are executed by the processor, the aforementioned AI visual multi-target tracking method for drone swarm collaboration is implemented.

[0014] The present invention has the following effects: 1. This invention divides multi-view template images into overall regions and local sub-regions, and quantifies the identification contribution weight of different local regions through regional feature contribution analysis. It accurately mines the subtle local feature differences between highly similar drones, effectively solving the technical problem of difficulty in distinguishing highly similar participating drones, and fundamentally reducing the probability of target confusion and misidentification.

[0015] 2. This invention integrates multiple dimensions of information, including shooting angle deviation, local feature tracking saliency, image appearance matching characteristics, and temporal motion trajectory association, and combines spatial angle constraints, edge feature matching, and temporal motion distance measurement to comprehensively construct a quantitative evaluation system. This system compensates for the shortcomings of single visual features being susceptible to interference from image tilt, shaking, and occlusion, and significantly improves the robustness and environmental adaptability of target tracking under complex flight conditions.

[0016] 3. This invention constructs a multi-target tracking cost function and uses a globally optimal allocation strategy to determine the matching scheme between the UAV swarm and the target to be tracked. This enables collaborative and accurate pairing and tracking of multiple shooting UAVs and multiple similar targets, effectively avoiding problems such as target loss, trajectory jumps and tracking errors in traditional tracking methods. It ensures stable and accurate broadcast of event videos, eliminates disputes caused by tracking errors, and meets the practical application needs of continuous and stable multi-target collaborative tracking of UAV swarms. Attached Figure Description

[0017] Figure 1 This is a flowchart of steps S1-S6 in an AI visual multi-target tracking method for drone swarm collaboration according to an embodiment of the present invention.

[0018] Figure 2 This is a structural block diagram of an AI visual multi-target tracking system for drone swarm collaboration according to an embodiment of the present invention. Detailed Implementation

[0019] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments.

[0020] Specific implementation scenario: In a competition venue with a racing track and obstacle course, a swarm of multiple tracking drones equipped with visual acquisition and positioning devices is used to collaboratively track multiple participating FPV racing drones that are highly similar in appearance, structure, and size and are in a state of high-speed maneuvering and frequent cross-obstruction. During the tracking process, the relative pose of the drone and the target being tracked changes dynamically in real time. The acquired images are prone to interference such as tilting, shaking, motion blur, and target obscuring. To meet the needs of real-time broadcasting of the event, flight trajectory recording, crossing order determination, and collision liability identification in complex dynamic scenarios, the system aims to achieve stable, continuous, and error-free accurate tracking of multiple participating drones.

[0021] Reference Figure 1 A method for AI-based visual multi-target tracking in drone swarm collaboration includes steps S1-S6, as detailed below: S1: Acquire multi-view template image data of the target to be tracked, as well as real-time video stream data and spatial location data collected by the drone swarm during flight.

[0022] First, for the tracking drone cluster and the participating drones to be tracked in the target flight field, the infrared dual-light camera carried by the tracking drone cluster collects flight video data of any drone to be tracked in the field from takeoff to the current moment in real time; the collected video data is then subjected to continuous frame extraction and grayscale processing to obtain the corresponding flight grayscale image.

[0023] Template images of the target to be tracked are acquired in preset standard shooting directions, such as up, down, left, right, front, and back. These images are then converted to grayscale to obtain template grayscale images. Target detection and foreground extraction are performed on the template grayscale images, removing irrelevant pixels such as the background and retaining only the target's pixel set, forming a complete overall region under the current shooting direction. This complete overall region comprises all effective pixels of the target, including all its appearance structure and edge features, and serves as the baseline region for feature analysis. Based on this baseline region, several uniformly segmented sub-regions of consistent size are obtained, which are non-overlapping, fully covered, and without omissions.

[0024] Because the appearance and visual features of multiple participating drones are highly similar, it is difficult to achieve stable differentiation based solely on overall image features. Furthermore, the contribution of different local regions to target identification varies significantly; some regions exhibit high feature redundancy and weak distinguishing ability, while others possess strong feature uniqueness and can serve as key identification criteria. To accurately quantify the effective identification ability of each local region in distinguishing different targets to be tracked, and to avoid misassignment due to feature confusion, the specific steps are as follows: S2: Based on multi-view template image data, the tracking saliency of the target under different shooting directions is calculated by the regional feature contribution analysis method. The tracking saliency is used to characterize the identification contribution weight of the local area of ​​the target image when distinguishing different targets.

[0025] For a target template image under the same shooting direction, the entire region is uniformly divided into several non-overlapping, fully covered, and complete sub-regions, based on the complete overall region. A statistical distribution distance metric algorithm is used to calculate the difference in grayscale feature distribution between the target and other targets within the aforementioned complete overall region, obtaining the first feature distribution difference value.

[0026] For each sub-region, the remaining area in the template image after removing that sub-region is used as the comparison area. The same algorithm is used to calculate the difference in gray-scale feature distribution between the target to be tracked and other targets to be tracked in the remaining area to obtain the second feature distribution difference value.

[0027] The tracking significance of the current sub-region under the corresponding shooting direction is determined based on the difference between the first feature distribution difference value and the second feature distribution difference value. The larger the difference value, the stronger the contribution of the sub-region to target differentiation and the higher the tracking significance.

[0028] In another embodiment, the feature distribution difference value is calculated by a statistical distribution distance metric algorithm. The statistical distribution distance metric algorithm can quantitatively characterize the dispersion and similarity of two sets of gray-scale feature distributions, thereby accurately reflecting the magnitude of visual feature differences of different targets in the corresponding image regions. The statistical distribution distance metric algorithm includes KS divergence, Bach distance, and JS divergence. One of them can be used independently or in combination according to the tracking accuracy requirements and computational efficiency requirements.

[0029] To this end, by dividing the template grayscale images under different shooting directions into regions, the grayscale feature distributions of the overall image and each sub-region image are constructed respectively. The distinguishing contribution of the overall and local features is compared and analyzed, and the feature weight of the highly recognizable local regions is strengthened, thereby achieving accurate differentiation and stable tracking of multiple similar drones.

[0030] After completing the quantification calculation of the tracking saliency of each target under different shooting angles, it is also necessary to combine the actual spatial attitude relationship between the shooting drone and the target to measure the adaptability of the current shooting angle. Since both the drone swarm and the target are in continuous dynamic flight, their relative poses change constantly in real time, and the actual shooting angle will continuously deviate from the preset standard shooting direction, thus affecting the effective identification of local target features. To combine the degree of angle deviation with the ability to distinguish inherent target features and comprehensively quantify the tracking adaptability under different angles, spatial angle deviation is introduced for joint analysis. The specific implementation steps are as follows: S3: Based on spatial location data, calculate the relative spatial angle between the drone and the target to be tracked, and combine the tracking salience to determine the tracking weight under the current shooting perspective.

[0031] First, based on the real-time three-dimensional spatial coordinate data of the drone and the target to be tracked, the line connecting the two is calculated in the spatial coordinate system. The actual included angles on each axis; the difference between the actual included angles and the reference included angles on the corresponding axes of the preset standard shooting direction (up, down, left, right, front, back) is calculated to obtain the included angle deviation value of each axis. The included angle deviation values ​​of all axes are accumulated to obtain the included angle deviation accumulation value. The included angle deviation accumulation value is used to characterize the degree of deviation between the current shooting direction and the preset standard shooting direction. The smaller the deviation accumulation value, the closer the current shooting angle is to the standard shooting direction.

[0032] Subsequently, the accumulated value of the included angle deviation is multiplied by the tracking saliency under the corresponding preset standard shooting direction, or a weighted fusion process is performed using preset weight coefficients according to actual tracking needs to obtain the tracking weight. The tracking weight comprehensively reflects the adaptability of the current shooting angle and the identification contribution of the target local area, providing a quantitative basis for the accurate matching and tracking attribution determination of the target to be tracked.

[0033] After obtaining the tracking weights corresponding to different shooting angles, relying solely on the viewpoint adaptation weights cannot directly complete cross-viewpoint target feature matching and identity association. During drone flight, images are prone to distortion, occlusion, and viewpoint shifts. Template features from a single angle are difficult to adapt to real-time dynamically acquired video images. It is necessary to rely on effective visual features such as edges to establish feature associations between real-time images and multi-viewpoint template images. To fully integrate visual appearance similarity and viewpoint adaptation weights, and comprehensively evaluate the degree of specific matching association between the shooting drone and each target to be tracked, further improving the accuracy of multi-target differentiation, the specific steps are as follows: S4: Extract features from real-time video stream data, calculate the appearance matching degree between real-time image features and multi-view template image data, and combine the tracking weights to calculate the tracking belonging degree.

[0034] Edge detection operators are used to process the real-time video stream image and the template image respectively to extract the corresponding edge pixel feature sets. Based on the local similarity of the two sets of edge pixel features, a similarity matrix between the edge features of the real-time image and the edge features of the template image is constructed. The optimal assignment algorithm is used to perform global matching on the similarity matrix to determine the maximum matching combination between feature points. The cumulative or mean value of the local similarities of all feature points within the maximum matching combination is calculated, and this cumulative or mean value is used as the appearance matching degree of the current target to be tracked. The optimal assignment algorithm is used to achieve the optimal matching of feature points, including but not limited to the KM algorithm and the Hungarian algorithm.

[0035] In another embodiment, edge pixel features of the real-time image and the template image are extracted by an edge detection operator, and a similarity matrix is ​​constructed based on the structural similarity of the neighboring pixels of the feature points. An optimal assignment algorithm is used to complete the one-to-one optimal matching between feature points, and the mean or sum of the similarity of the successfully matched feature points is defined as the appearance matching degree. The optimal assignment algorithm can be any one of the KM algorithm or the Hungarian algorithm.

[0036] While tracking attribution is obtained by fusing visual features with viewpoint weights, relying solely on image appearance features is susceptible to interference from factors such as image distortion, partial occlusion, and abrupt viewpoint changes, resulting in limited robustness of matching results based on a single visual dimension. The drone and the target being tracked exhibit continuous and correlated motion patterns during flight; temporal motion information such as spatial position, flight speed, and motion trends can objectively reflect their dynamic relationship and serve as an effective supplementary constraint to visual matching. To introduce a motion-dimensional correlation evaluation dimension, enhancing the rationality and anti-interference capability of target matching from a temporal motion perspective, and further improving the stability of multi-target tracking, the specific steps are as follows: S5: Based on the time series of spatial location data, calculate the convergence of motion trajectories between the drone and the target to be tracked.

[0037] The historical 3D spatial coordinate data of the drone and the target to be tracked are processed using temporal difference to obtain time series data of corresponding flight speeds or accelerations, thus revealing the dynamic motion patterns of the drone and the target. A temporal alignment algorithm is used to calculate the nonlinear matching distance between the two sets of motion time series. This nonlinear matching distance reflects the degree of similarity between the drone and the target in terms of motion trends and trajectory changes, and is used as the trajectory convergence. The temporal alignment algorithm, including but not limited to Dynamic Time Warping (DTW), is used to achieve precise alignment of motion sequences of different durations and phases.

[0038] In another embodiment, based on the continuous spatial position time series of the drone and the target to be tracked, the flight velocity sequence and acceleration sequence are obtained by first-order difference or second-order difference, respectively; a time alignment algorithm is used to perform nonlinear alignment and distance measurement on the flight velocity sequence and acceleration sequence, and the calculated sequence distance is defined as the trajectory convergence; the time alignment algorithm is preferably the dynamic time warping (DTW) algorithm.

[0039] Based on the tracking affiliation degree and trajectory convergence calculated above, the degree of association between the drone and each tracked target has been quantitatively evaluated from both the visual feature matching dimension and the dynamic motion association dimension. However, single-dimensional evaluation indicators have limitations; they cannot comprehensively and objectively measure the rationality of the overall matching combination, and it is difficult to achieve the globally optimal allocation between the drone swarm and multiple targets. To integrate visual matching characteristics and motion temporal association characteristics, a unified quantitative evaluation standard is established, a global cost constraint relationship is constructed, and the optimal pairing combination is selected through global optimization to avoid problems such as local optima, target mismatch, and repeated tracking, ensuring the overall stability and matching accuracy of drone swarm collaborative tracking. The specific steps are as follows: S6: By combining tracking affiliation and trajectory convergence, a multi-target tracking cost function is constructed. The optimal tracking matching scheme is determined by minimizing the cost function, and the UAV swarm is controlled to perform collaborative tracking of the target.

[0040] First, a weighted calculation is performed based on tracking attribution (the core indicator for target differentiation) and trajectory proximity (the indicator for motion matching) to obtain the tracking score for a single matching relationship.

[0041] The tracking score of a single group (drone - target to be tracked) is obtained through reasonable calculation, and then the optimal solution is selected by traversal.

[0042] Among them, tracking attribution degree is used to characterize the degree of feature matching between the target to be tracked and the corresponding drone, reflecting the degree of fit between the two in appearance and features; trajectory convergence is used to characterize the degree of fit between the two in motion state and trajectory trend. The two indicators together constitute the core evaluation criteria for a single set of matching relationships. By multiplying the tracking attribution degree and trajectory convergence, or by setting reasonable weight coefficients according to actual tracking needs, and then weighting and summing the two indicators, a single set of tracking scores between each pair of drones and the target to be tracked can be obtained. This score directly quantifies the rationality and stability of the single set of matching relationships.

[0043] Subsequently, all preset tracking assignment schemes are iterated. Each scheme corresponds to a set of matching relationships between the camera drone and the target to be tracked, ensuring that each camera drone corresponds to only one target and each target is tracked by only one camera drone, avoiding duplicate tracking or missed tracking. During the iteration, the sum of all single-group tracking scores under each assignment scheme is calculated one by one. Finally, the assignment scheme with the smallest sum of tracking scores is selected as the optimal tracking matching scheme. It should be noted that the core of the selection logic is that the smaller the sum of tracking scores, the smaller the overall matching error and the higher the tracking stability. This can effectively avoid target loss or mistracking due to matching deviation, ensuring the accuracy and reliability of the tracking process and providing a clear execution basis for subsequent stable tracking.

[0044] This invention also provides an AI-based visual multi-target tracking system for collaborative drone swarming. For example... Figure 2 As shown, the system includes a processor and a memory. The memory stores computer program instructions, which, when executed by the processor, implement an AI visual multi-target tracking method for UAV swarm collaboration according to the first aspect of the present invention. The system also includes other components well-known to those skilled in the art, such as a communication bus and a communication interface, the settings and functions of which are known in the art and will not be described further here.

[0045] It should be noted that those skilled in the art can make various modifications and improvements without departing from the inventive concept, and these all fall within the scope of protection of this invention. Therefore, the scope of protection of this patent should be determined by the appended claims.

Claims

1. A method for AI-based visual multi-target tracking in collaborative drone swarming, characterized in that, include: Acquire multi-view template image data of the target to be tracked, as well as real-time video stream data and spatial location data collected by the drone swarm during flight; Based on multi-view template image data, the tracking saliency of the target under different shooting directions is calculated by the regional feature contribution analysis method. The tracking saliency is used to characterize the identification contribution weight of the local region of the target image when distinguishing different targets. Based on spatial location data, the relative spatial angle between the drone and the target to be tracked is calculated, and the tracking weight under the current shooting view is determined by combining the tracking salience. Feature extraction is performed on real-time video stream data, the appearance matching degree between real-time image features and multi-view template image data is calculated, and the tracking belonging degree is calculated by combining the tracking weight. Based on the time series of spatial location data, the convergence of motion trajectories between the drone and the target to be tracked is calculated. By combining tracking affiliation and trajectory convergence, a multi-target tracking cost function is constructed. The optimal tracking matching scheme is determined by minimizing the cost function, and the UAV swarm is controlled to perform collaborative tracking of the target.

2. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The step of calculating the tracking saliency of the target under different shooting directions includes: The template image of the target to be tracked is divided into a whole region and several sub-regions; Calculate the difference in the first feature distribution between the target to be tracked and the interfering target in the overall region; Calculate the difference in the second feature distribution between the target to be tracked and the interfering target in the remaining region after removing any sub-region; The tracking significance of the sub-region is determined based on the difference between the first feature distribution difference value and the second feature distribution difference value.

3. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The step of calculating the appearance matching degree between real-time image features and the multi-view template image data includes: Edge detection operators are used to extract edge pixel feature sets from real-time video stream images and template images, respectively. Construct a similarity matrix between the edge features of the real-time image and the edge features of the template image; The optimal assignment algorithm is used to find the maximum matching combination in the similarity matrix, and the cumulative value or mean of the local feature similarity in the maximum matching combination is calculated as the appearance matching degree.

4. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The method for calculating the convergence of the motion trajectory is as follows: Differential processing is performed on the historical spatial location data of the drone and the target to be tracked to obtain velocity or acceleration time series; A time-series alignment algorithm is used to calculate the nonlinear distance between the drone and the target to be tracked in the time series, and the nonlinear distance is used as the convergence of the motion trajectory.

5. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The step of determining the tracking weights under the current shooting angle includes: Calculate the cumulative deviation of the angle between the line connecting the drone and the target to be tracked in each axis of the spatial coordinate system and the angle between the line and the preset standard shooting direction; The accumulated deviation value is multiplied or weighted and fused with the tracking significance of the corresponding shooting direction to obtain the tracking weight.

6. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The step of constructing the multi-target tracking cost function includes: Calculate the product or weighted sum of the tracking affiliation degree and the trajectory convergence to obtain the tracking score for each pair of drones and the target to be tracked; Iterate through all preset tracking allocation schemes and select the scheme with the smallest sum of tracking scores as the optimal tracking matching scheme.

7. The AI ​​visual multi-target tracking method for drone swarm collaboration according to claim 1, characterized in that, The multi-view template image data consists of grayscale template images of the target to be tracked under six preset standard shooting directions: up, down, left, right, front, and back. The real-time video stream data is processed by continuous frame extraction and grayscale conversion to obtain a flight grayscale image sequence for tracking.

8. A drone swarm collaborative AI visual multi-target tracking system, characterized in that, include: A processor and a memory, wherein the memory stores computer program instructions that, when executed by the processor, implement the AI ​​visual multi-target tracking method for drone swarm collaboration according to any one of claims 1-7.