A passive domain adaptive target detection method and system for unmanned aerial vehicle aerial images

By using inter-class relationship modeling and a dual-confidence pseudo-label distillation mechanism, the problems of class imbalance and low-confidence sample utilization in UAV aerial image target detection are solved, improving detection robustness and cross-domain adaptability, and achieving more efficient target detection results.

CN122049754BActive Publication Date: 2026-06-26NANJING LES INFORMATION TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING LES INFORMATION TECH
Filing Date
2026-04-15
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing UAV aerial image target detection methods suffer from class imbalance, insufficient utilization of low-confidence samples, and insufficient robustness of cross-domain features when deployed across domains. In particular, the detection accuracy is difficult to meet practical needs in complex scenarios.

Method used

By employing inter-class relationship modeling and a dual-confidence pseudo-label distillation mechanism, and by constructing a dynamically updated class confusion matrix and an exponential moving average update strategy, combined with low-confidence pseudo-label distillation loss, we can achieve quantitative evaluation and dynamic correction of the identification bias between the majority and minority classes, and perform instance-level mixed data augmentation to improve the generalization detection capability of minority class targets.

Benefits of technology

It significantly improves the robustness of UAV detection in complex scenarios, reduces the labeling dependence of cross-domain deployment, solves the shortcomings of class imbalance adaptation, low-confidence sample utilization and cross-domain feature learning, and provides a more reliable domain adaptation solution.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122049754B_ABST
    Figure CN122049754B_ABST
Patent Text Reader

Abstract

The application discloses a kind of unmanned aerial vehicle aerial photograph image-oriented passive domain self-adapting target detection method and system, comprising: input source domain and target domain image, impose strong weak enhancement transformation on target domain image, output target domain boundary frame;Class confusion matrix is constructed and iteratively updated, and the deviation between classes is estimated;Match the minority class and the majority class instance of high correlation, and carry out mixed enhancement processing to corresponding instance;Constrain target detection result;Joint constraint is applied to target detection process, to guide the detection result of model on target domain image gradually converges, finally output the detection result of target in target domain image.The method of the application combines inter-class relationship modeling with double-confidence pseudo-label distillation mechanism, by constructing dynamically updated class confusion matrix and exponential moving average updating strategy, the quantitative evaluation and dynamic correction of the identification bias of majority class and minority class are realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision and low-altitude application technology, specifically relating to a passive domain adaptive target detection method and system for UAV aerial images. Background Technology

[0002] Low-altitude aerial target detection is a key technology for achieving autonomous inspection and intelligent environmental perception. With the widespread application of UAV platforms across various industries, traditional detection methods are struggling to adapt to complex and ever-changing real-world deployment scenarios. Supervised detection algorithms such as Faster R-CNN and the YOLO series, while performing excellently on well-labeled benchmark datasets, inherently rely heavily on manual annotation, failing to address the issue of missing annotations when UAVs are deployed across domains. Domain adaptation methods, such as adversarial training and feature alignment, alleviate inter-domain differences to some extent, but require simultaneous access to source and target domain data during processing. This requirement is often difficult to meet in practical applications due to data privacy and transmission costs. More advanced passive domain adaptation methods, while requiring only a pre-trained model in the source domain for domain adaptation, still have significant limitations. These methods typically rely on high-confidence pseudo-labels for self-training based on a mean-teacher framework. However, UAV aerial images suffer from severe class imbalance, resulting in low-quality pseudo-labels for a few classes and small-scale targets. Furthermore, most of these methods use a uniform confidence threshold to process all classes, failing to adapt to the differentiated confidence distribution characteristics of different classes under domain shifts.

[0003] The challenges of low-altitude aerial target detection also lie in the efficiency of pseudo-label utilization. High-confidence pseudo-labels typically contain easily detectable large-scale targets and most categories, while low-confidence regions often contain small-scale targets and instances of a few categories with significant detection value. Most existing passive domain adaptive methods directly discard low-confidence detection results, resulting in a significant waste of potential supervisory signals. Furthermore, current mainstream class imbalance handling methods, such as resampling and loss weighting, lack explicit modeling of inter-class confusion relationships during model optimization, leading to limited improvement in addressing false detections of similar categories. These technical bottlenecks severely restrict further improvements in detection accuracy, especially in real-world UAV flight scenarios with multi-dimensional domain shifts such as weather changes, perspective differences, and scale variations; the detection performance of existing methods often falls short of practical application requirements. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention aims to provide a passive domain adaptive target detection method and system for UAV aerial images. This method combines inter-class relationship modeling with a dual-confidence pseudo-label distillation mechanism. By constructing a dynamically updated class confusion matrix and an exponential moving average update strategy, it achieves quantitative evaluation and dynamic correction of majority and minority class identification biases. Simultaneously, class relationship enhancement enables instance-level mixed data augmentation based on inter-class similarity, while the low-confidence pseudo-label distillation loss fully leverages the supervisory value of difficult samples through an adaptive weighting mechanism. This invention effectively solves the technical problems of existing methods in class imbalance adaptation, low-confidence sample utilization, and robust cross-domain feature learning.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0006] The present invention provides a passive domain adaptive target detection method for UAV aerial images, comprising the following steps:

[0007] 1) Input source and target domain images, apply strong and weak enhancement transformations to the target domain image, and use the teacher model to generate a set of pseudo-labels based on the weakly enhanced image. and with a set of pseudo-tags The supervised student model detects target domain bounding boxes based on strongly enhanced images;

[0008] 2) Construct a category confusion matrix Iterative updates are performed to identify the similarity between the majority and minority classes and estimate the bias between categories;

[0009] 3) Based on the category correlation information obtained from the category confusion matrix, highly correlated minority and majority class instances are matched, and the corresponding instances are subjected to hybrid enhancement processing to improve the generalization detection capability for minority class targets;

[0010] 4) Design a low-confidence pseudo-label distillation loss to constrain the target detection results;

[0011] 5) Apply joint constraints to the target detection process to guide the model to gradually converge the detection results on the target domain image, and finally output the detection results of the target in the target domain image.

[0012] Further, step 1) specifically includes:

[0013] 11) For cross-domain target detection tasks using UAV aerial images, define a labeled dataset in the source domain. ,in Represents the source domain image. This represents the bounding box and category corresponding to the source domain image. , This represents the bounding box corresponding to the source domain image. Indicate the category corresponding to the source domain image; define the target domain as an unlabeled dataset. ; for target domain image Apply weak enhancement transformation Obtain weakly enhanced image For target domain image Strengthen enhancement transformation Obtain a strongly enhanced image ;

[0014] 12) Using the teacher model for weakly enhanced images Perform forward reasoning to generate a set of pseudo-labels. ,as follows:

[0015] ;

[0016] in, For bounding box coordinates, For predicting categories, The confidence score is... This is the set of learnable parameters for the teacher model backbone network, feature fusion module, and detection head. The number of pseudo-labels is represented by . The student model uses the set of pseudo-labels output by the teacher model as a supervision signal to detect objects in the target domain image, ultimately outputting the bounding boxes of the predicted targets. and their corresponding categories The formula is:

[0017] ;

[0018] in, The set of learnable parameters for the student model backbone network, feature fusion module, and detection head; the set of parameters for the teacher model. Through the student model parameter set The exponential moving average (EMA) is used for updating, and the formula is:

[0019] ;

[0020] in, The decay rate is used to control the update momentum; the pseudo-labels generated by the teacher model are combined with the predicted output of the student model to calculate the unsupervised loss.

[0021] Further, step 2) specifically includes:

[0022] 21) Construct a class confusion matrix M between the true class and the predicted class based on labeled image samples from the source and target domains during training, where the true class... Predicted category derived from manually labeled real categories The specific formula is derived from the forward propagation inference of the model:

[0023] ;

[0024] 22) Each real category The corresponding row vectors are normalized to obtain the class conditional probability distribution. It represents the true category. Predicted as category The estimated probabilities are as follows:

[0025] ;

[0026] in, Indicates the total number of categories. For the first One category;

[0027] 23) The categorical confusion matrix is ​​updated iteratively using the exponential moving average (EMA). The update rules are as follows:

[0028] ;

[0029] in, This is the smoothing coefficient.

[0030] Furthermore, step 3) specifically includes:

[0031] 31) Based on the category confusion matrix The acquired category associations identify pairings between easily confused majority and minority categories. Instances of highly similar, easily confused categories are then enhanced using a blending method. An instance library is constructed to store target image instances cropped from labeled images. During enhancement, base instances and blended instances are selected from this library and fused pixel-wise using the MixUp method at a ratio of λ to generate enhanced samples. and its category vector The details are as follows:

[0032] ;

[0033] ;

[0034] in, and These represent the cropped images of the base instance and the blended instance, respectively. and These represent the categories of basic instances and hybrid instances, respectively. A class vector representing an augmented instance;

[0035] 32) In the source domain, instances from both the source and target domains are mixed to assist domain adaptation using real labels; in the target domain, instances from the target domain are used first, and instances from the source domain are only introduced when there are insufficient samples of a specific class. The specific class samples refer to classes with small numbers or high confusion in class prediction, so as to ensure that the enhancement process focuses on the representation of the target domain and improves the model's ability to distinguish minority classes and small-scale targets.

[0036] Further, step 4) specifically includes:

[0037] 41) The pseudo-label set generated in step 1) is processed by removing background scores, confidence filtering, and non-maximum suppression. After filtering, a set of pseudo-labels is obtained. Then, by analyzing the set of pseudo-tags... Apply a high confidence threshold Extract the set of high-confidence pseudo-labels ,as follows:

[0038] ;

[0039] in, The set of pseudo-labels obtained after preprocessing The Middle In the target domain image, the first An example of a pseudo-tag. For the first In the target domain image, the first The predicted class confidence vector corresponding to each pseudo-label instance;

[0040] 42) Set a threshold for the intersection-union ratio (IoU) between all pseudo-labels and the set of high-confidence pseudo-labels. Based on this threshold, a set of pseudo-labels that does not significantly overlap with the set of high-confidence pseudo-labels is selected. The specific formula is as follows:

[0041] ;

[0042] in, A collection of pseudo-tags The Middle In the target domain image, the first An example of a pseudo-label; A set of high-confidence pseudo-labels The Middle In the target domain image, the first An example of a pseudo-label;

[0043] By setting a background confidence threshold Remove the pseudo-tags identified as background to obtain the filtered set of pseudo-tags. The specific formula is as follows:

[0044] ;

[0045] in, For the first In the target domain image, the first Background category confidence of each pseudo-label instance;

[0046] Optimize the filtered pseudo-labels, remove background scores, and convert the foreground score vectors. Divide by its L1 norm to amplify the score differences between foreground categories, the formula is:

[0047] ;

[0048] in, The class confidence of the optimized foreground score vector;

[0049] Apply a low confidence threshold Obtain a set of low-confidence pseudo-labels The specific formula is as follows:

[0050] ;

[0051] 43) Designing low-confidence pseudo-label distillation loss Using Kullback-Leibler (KL) divergence loss, the class prediction probability distribution of the student model in the region corresponding to the low-confidence pseudo-label is calculated. With magnified category distribution The difference between them, the specific formula for the low-confidence pseudo-label distillation loss function is:

[0052] ;

[0053] in, Kullback-Leibler divergence is used to measure the distribution of predicted class probabilities. Between and magnified category distribution The difference, as a distillation loss, guides the two to remain consistent in the probability space.

[0054] The loss function described above provides robust representations of difficult positive samples for the student model, enhancing the model's ability to represent features across the entire target domain. The design of this loss function effectively utilizes the inter-class relationships implied by low-confidence pseudo-labels, suppresses the influence of their inherent noise, and avoids excessive bias of the model towards easy positive samples.

[0055] Furthermore, step 5) specifically includes:

[0056] By introducing a composite loss function Joint constraints are applied to the object detection process to guide the student model to gradually converge its detection results on the object domain image, ultimately outputting the detection results of the object in the object domain image. This is the total loss function, used to guide the updating of all network parameters; This indicates that by leveraging high-confidence pseudo-labels, the model's detection capability is improved through high-confidence predictions of easy positive samples; low-confidence pseudo-labels distillation loss. It is used to guide student models to focus on learning from difficult positive samples.

[0057] This invention also provides a passive domain adaptive target detection system for UAV aerial images, comprising:

[0058] The detection module takes source and target domain images as input, applies strong and weak enhancement transformations to the target domain image, and uses a teacher model to generate a set of pseudo-labels based on the weakly enhanced image. and with a set of pseudo-tags The supervised student model detects target domain bounding boxes based on strongly enhanced images;

[0059] The class bias estimation module is used to construct the class confusion matrix. Iterative updates are performed to identify the similarity between the majority and minority classes and estimate the bias between categories;

[0060] The hybrid enhancement module is used to match highly correlated minority and majority class instances based on the class correlation information modeled by the class confusion matrix, and perform hybrid enhancement processing on the corresponding instances to improve the generalization detection capability of minority class targets.

[0061] The constraint module is used to design the low-confidence pseudo-label distillation loss and constrain the target detection results;

[0062] The target detection output module is used to apply joint constraints to the target detection process, so as to guide the model to gradually converge the detection results on the target domain image, and finally output the detection results of the target in the target domain image.

[0063] The beneficial effects of this invention are:

[0064] This invention deeply integrates inter-class relationship modeling with a dual-confidence pseudo-label distillation mechanism. It achieves systematic bias correction through dynamic bias quantization, enhances the representation capability of minority classes by leveraging class relationships, and adaptively balances intra-domain and inter-domain feature learning using a cross-domain hybrid strategy, constructing a complete technical framework from bias diagnosis to correction optimization. This invention significantly improves the robustness of UAV detection in complex scenarios, reduces label dependence in cross-domain deployment, and effectively addresses the shortcomings of existing technologies in class imbalance adaptation, low-confidence sample utilization, and cross-domain feature learning. It provides a more reliable domain-adaptive solution for low-altitude aerial target detection. Specific technical effects are as follows:

[0065] (1) Two-way collaborative mechanism of dynamic bias estimation and enhanced guidance: In the modeling of inter-class relationships, not only is bias quantification achieved through confusion matrix, but its output is also directly converted into weight allocation, forming a closed-loop feedback of "estimation-guidance-optimization"; this two-way collaboration ensures the accuracy and timeliness of model bias correction.

[0066] (2) Joint optimization of instance-level hybrid enhancement and soft label supervision: In category relationship enhancement, the following approach is adopted. While achieving pixel-level blending, the technology introduces a soft label supervision mechanism to preserve the semantic association information of the blended categories. This joint optimization strategy of "data augmentation + soft supervision" increases the diversity of minority class samples while providing richer semantic supervision signals.

[0067] (3) Differentiated knowledge distillation system for high and low confidence samples: Differentiated utilization strategies are designed for pseudo-labels of different confidence levels: high confidence samples provide accurate positioning and classification supervision, while low confidence samples are used for... Divergence loss propagates semantic distribution; this hierarchical distillation mechanism maximizes the mining of information value from pseudo-labels and achieves a progressive knowledge transfer from easy to difficult.

[0068] (4) Adaptive balancing strategy for cross-domain instance selection: An intelligent cross-domain instance selection mechanism is designed during the enhancement process. Target domain instances are used first to maintain domain relevance, and source domain instances are intelligently introduced to supplement diversity when there are insufficient target domain samples. This adaptive balancing strategy effectively alleviates the dual challenges of domain offset and sample scarcity.

[0069] (5) Iterative training architecture of online learning and progressive optimization: The whole system adopts an online learning mode, the inter-class relationship matrix is ​​dynamically updated with training, and the enhancement strategy is adaptively adjusted with the change of bias, forming a continuous optimization iterative process; this architecture enables the model to adapt to different training states and achieve progressive performance improvement. Attached Figure Description

[0070] Figure 1This is a flowchart illustrating the principle of the method of the present invention. Detailed Implementation

[0071] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to embodiments and accompanying drawings. The content mentioned in the embodiments is not intended to limit the present invention.

[0072] Reference Figure 1 As shown, the passive domain adaptive target detection method for UAV aerial images of the present invention comprises the following steps:

[0073] 1) Input source and target domain images, apply strong and weak enhancement transformations to the target domain image, and use the teacher model to generate a set of pseudo-labels based on the weakly enhanced image. and with a set of pseudo-tags The supervised student model detects target domain bounding boxes based on strongly enhanced images; specifically, it includes:

[0074] 11) For cross-domain target detection tasks using UAV aerial images, define a labeled dataset in the source domain. ,in Represents the source domain image. This represents the bounding box and category corresponding to the source domain image. , This represents the bounding box corresponding to the source domain image. Indicate the category corresponding to the source domain image; define the target domain as an unlabeled dataset. ; for target domain image Apply weak enhancement transformation Obtain weakly enhanced image For target domain image Strengthen enhancement transformation Obtain a strongly enhanced image ;

[0075] 12) Using the teacher model for weakly enhanced images Perform forward reasoning to generate a set of pseudo-labels. ,as follows:

[0076] ;

[0077] in, For bounding box coordinates, For predicting categories, The confidence score is... This is the set of learnable parameters for the teacher model backbone network, feature fusion module, and detection head. The number of pseudo-labels is used; the student model uses the pseudo-label set output by the teacher model as a supervision signal to detect the target domain image, and finally outputs the bounding box of the predicted target. and their corresponding categories The formula is:

[0078] ;

[0079] in, The set of learnable parameters for the student model backbone network, feature fusion module, and detection head; the set of parameters for the teacher model. Through the student model parameter set The exponential moving average (EMA) is used for updating, and the formula is:

[0080] ;

[0081] in, The decay rate is used to control the update momentum; the pseudo-labels generated by the teacher model are combined with the predicted output of the student model to calculate the unsupervised loss.

[0082] 2) Construct a category confusion matrix Iterative updates are performed to identify the similarity between the majority and minority classes and estimate the bias between categories; specifically including:

[0083] 21) Construct a class confusion matrix M between the true class and the predicted class based on labeled image samples from the source and target domains during training, where the true class... Predicted category derived from manually labeled real categories The specific formula is derived from the forward propagation inference of the model:

[0084] ;

[0085] 22) Each real category The corresponding row vectors are normalized to obtain the class conditional probability distribution. It represents the true category. Predicted as category The estimated probabilities are as follows:

[0086] ;

[0087] in, Indicates the total number of categories. For the first One category;

[0088] 23) The categorical confusion matrix is ​​updated iteratively using the exponential moving average (EMA). The update rules are as follows:

[0089] ;

[0090] in, For smoothing coefficients;

[0091] This achieves the following:

[0092] Online dynamic quantification of category recognition bias: By constructing and continuously updating the category confusion matrix, it is possible to track the model's tendency to confuse different categories during training in real time, especially to accurately identify the misjudgment patterns between the majority and minority classes, providing a data foundation for subsequent bias correction;

[0093] A stable and adaptive bias estimation mechanism was established: an exponential moving average update strategy was introduced, which not only smoothed out the noise impact caused by batch fluctuations, but also allowed the gradual fusion of historical information and current observations, forming a more robust long-term estimate of class bias and effectively avoiding estimation bias caused by uneven distribution of samples in a single batch.

[0094] It possesses efficient computational and storage characteristics: the maintenance and updating of the confusion matrix only involves matrix addition and weighting operations, with computational overhead far lower than class balancing methods based on feature distance or resampling, and the required storage space is fixed. It does not increase with the number of training samples, making it suitable for resource-constrained drone deployments.

[0095] 3) Based on the category correlation information obtained from the category confusion matrix, highly correlated minority and majority class instances are matched, and hybrid enhancement processing is applied to the corresponding instances to improve the generalization detection capability for minority class targets; specifically including:

[0096] 31) Based on the category confusion matrix The acquired category associations identify pairings between easily confused majority and minority categories. Instances of highly similar, easily confused categories are then enhanced using a blending method. An instance library is constructed to store target image instances cropped from labeled images. During enhancement, base instances and blended instances are selected from this library and fused pixel-wise using the MixUp method at a ratio of λ to generate enhanced samples. and its category vector The details are as follows:

[0097] ;

[0098] ;

[0099] in, and These represent the cropped images of the base instance and the blended instance, respectively. and These represent the categories of basic instances and hybrid instances, respectively. A class vector representing an augmented instance;

[0100] 32) In the source domain, instances from both the source and target domains are mixed to assist domain adaptation using real labels; in the target domain, instances from the target domain are used first, and instances from the source domain are only introduced when there are insufficient samples of a specific class. The specific class refers to classes with small sample sizes or those with high confusion in class prediction, to ensure that the enhancement process focuses on the target domain representation and improves the model's ability to distinguish minority classes and small-scale targets.

[0101] This achieves the following:

[0102] Data-driven targeted sample augmentation: Instance pairing is guided by high similarity class pairs identified by the inter-class relationship module, avoiding the blindness of random augmentation and ensuring that hybrid augmentation can effectively improve the model's ability to distinguish easily confused minority classes, thus alleviating the class imbalance problem from the data level.

[0103] A soft supervision signal that preserves semantic relationships was constructed: MixUp was used to generate soft labels instead of hard single labels, which preserved the semantic relationship between the base category and the mixed category, providing the model with more fine-grained supervision information and helping to learn more robust feature representations;

[0104] A domain-adaptive enhancement strategy was designed: target domain instances are used preferentially for mixing in the target domain, and source domain instances are introduced only when the target domain lacks a specific category. This not only ensures the domain relevance of the enhanced samples, but also effectively utilizes the accuracy of the source domain annotation, and achieves the organic integration of source domain knowledge and target domain features.

[0105] Improved robustness to small-scale and occluded targets: Through instance-level hybrid enhancements, especially for minority classes and small-scale targets, the diversity and representativeness of these difficult examples in the training data are effectively increased, thereby significantly improving the model's detection performance for difficult targets in real-world complex scenarios.

[0106] 4) Design a low-confidence pseudo-label distillation loss to constrain the target detection results; specifically including:

[0107] 41) The pseudo-label set generated in step 1) is processed by removing background scores, confidence filtering, and non-maximum suppression. After filtering, a set of pseudo-labels is obtained. Then, by analyzing the set of pseudo-tags... Apply a high confidence threshold Extract the set of high-confidence pseudo-labels ,as follows:

[0108] ;

[0109] in, The set of pseudo-labels obtained after preprocessing The Middle In the target domain image, the first An example of a pseudo-tag. For the first In the target domain image, the first The predicted class confidence vector corresponding to each pseudo-label instance;

[0110] 42) Set a threshold for the intersection-union ratio (IoU) between all pseudo-labels and the set of high-confidence pseudo-labels. Based on this threshold, a set of pseudo-labels that does not significantly overlap with the set of high-confidence pseudo-labels is selected. The specific formula is as follows:

[0111] ;

[0112] in, A collection of pseudo-tags The Middle In the target domain image, the first An example of a pseudo-label; A set of high-confidence pseudo-labels The Middle In the target domain image, the first An example of a pseudo-label;

[0113] By setting a background confidence threshold Remove the pseudo-tags identified as background to obtain the filtered set of pseudo-tags. The specific formula is as follows:

[0114] ;

[0115] in, For the first In the target domain image, the first Background category confidence of each pseudo-label instance;

[0116] Optimize the filtered pseudo-labels, remove background scores, and convert the foreground score vectors. Divide by its L1 norm to amplify the score differences between foreground categories, the formula is:

[0117] ;

[0118] in, The class confidence of the optimized foreground score vector;

[0119] Apply a low confidence threshold Obtain a set of low-confidence pseudo-labels The specific formula is as follows:

[0120] ;

[0121] 43) Designing low-confidence pseudo-label distillation loss Using Kullback-Leibler (KL) divergence loss, the class prediction probability distribution of the student model in the region corresponding to the low-confidence pseudo-label is calculated. With magnified category distribution The difference between them, the specific formula for the low-confidence pseudo-label distillation loss function is:

[0122] ;

[0123] in, Kullback-Leibler divergence is used to measure the distribution of predicted class probabilities. Between and magnified category distribution The difference, as a distillation loss, guides the two to remain consistent in the probability space;

[0124] The loss function described above provides robust representations of difficult positive samples for the student model, enhancing the model's ability to represent features across the entire target domain. The design of this loss function effectively utilizes the inter-class relationships implied by low-confidence pseudo-labels, suppresses the influence of their inherent noise, and avoids excessive bias of the model towards easy positive samples.

[0125] This achieves the following:

[0126] It achieves refined mining and utilization of potentially difficult samples: through the dual mechanism of IoU overlap screening and background confidence filtering, it effectively separates candidate regions that have low overlap with high confidence detection boxes but may contain targets from pseudo-label candidate boxes, avoiding the waste of supervision information by directly treating low-scoring predictions as background in traditional methods, and significantly improving the recall rate for small targets, occluded targets and targets with large inter-domain differences.

[0127] A noise-robust soft supervision mechanism was constructed: KL divergence loss was used as the supervision signal for low-confidence samples, instead of directly using hard classification or regression loss. This effectively alleviated the interference of localization noise and classification noise in low-quality pseudo-labels on model optimization, enabling the student model to learn more discriminative feature representations from the class distribution of difficult samples.

[0128] An adaptive importance weighting strategy was designed: by normalizing the L1 norm of the foreground score vector, the score difference between foreground categories in low confidence samples was amplified, the model's class discrimination learning for difficult samples was strengthened, and the model focused more on the intrinsic semantic information of the samples rather than the absolute confidence score.

[0129] It promotes the model's feature consistency learning for difficult samples: by forcing the student model to maintain consistency with the teacher model in the category prediction distribution in low confidence regions, it enhances the domain invariance and robustness of the model's feature representation, and in particular improves the model's ability to generalize to marginal cases in complex scenarios in the target domain.

[0130] A complete learning loop of "easy example-difficult example" collaborative optimization has been formed: combining strong supervision of high-confidence pseudo-labels with soft distillation of low-confidence pseudo-labels, a progressive learning system from explicit positive samples to potentially difficult samples has been constructed, enabling the model to make full use of all levels of supervision information generated by the teacher model, and achieving more comprehensive and in-depth feature learning of the target domain data distribution.

[0131] 5) Apply joint constraints to the target detection process to guide the model to gradually converge the detection results in the target domain image, and finally output the detection results of the target in the target domain image; specifically including:

[0132] By introducing a composite loss function Joint constraints are applied to the object detection process to guide the student model to gradually converge its detection results on the object domain image, ultimately outputting the detection results of the object in the object domain image. This is the total loss function, used to guide the updating of all network parameters; This indicates that by leveraging high-confidence pseudo-labels, the model's detection capability is improved through high-confidence predictions of easy positive samples; low-confidence pseudo-labels distillation loss. It is used to guide student models to focus on learning from difficult positive samples.

[0133] This invention also provides a passive domain adaptive target detection system for UAV aerial images, comprising:

[0134] The detection module takes source and target domain images as input, applies strong and weak enhancement transformations to the target domain image, and uses a teacher model to generate a set of pseudo-labels based on the weakly enhanced image. and with a set of pseudo-tags The supervised student model detects target domain bounding boxes based on strongly enhanced images;

[0135] The class bias estimation module is used to construct the class confusion matrix. Iterative updates are performed to identify the similarity between the majority and minority classes and estimate the bias between categories;

[0136] The hybrid enhancement module is used to match highly correlated minority and majority class instances based on the class correlation information modeled by the class confusion matrix, and perform hybrid enhancement processing on the corresponding instances to improve the generalization detection capability of minority class targets.

[0137] The constraint module is used to design the low-confidence pseudo-label distillation loss and constrain the target detection results;

[0138] The target detection output module is used to apply joint constraints to the target detection process, so as to guide the model to gradually converge the detection results on the target domain image, and finally output the detection results of the target in the target domain image.

[0139] Example:

[0140] Experimental conditions: All experiments were implemented using the PyTorch framework on an NVIDIA GeForce RTX 3090 GPU. Experimental parameters were set as follows: Faster-R-CNN object detector was used, with VGG-16 and ResNet-101 as the backbone networks of the detection model; hyperparameters were set as follows: EMA decay rate... , The distribution hyperparameters are set to A weak-enhancement-strong-enhancement contrastive learning strategy is uniformly implemented for both source and target domain images. In the training process, the student model is first initialized with 20,000 iterations based on labeled source domain data. Subsequently, the weights of the student model are fully transferred to the teacher model as initial parameters, and the teacher model parameters are dynamically adjusted using the exponential moving average (EMA) of the student weights during subsequent training. Then, unlabeled target domain data is introduced to jointly train with the source domain data for 60,000 iterations across domains. The system is built on the open-source object detection framework Detectron2. In the experimental setup, each training batch contains 8 source domain images and 8 target domain images.

[0141] This experiment was validated on the publicly available UAVDT dataset, which includes three target classes: car, truck, and bus, in daytime, nighttime, and foggy scenarios. To simulate different domains, the original UAVDT dataset was further divided into several subsets based on different lighting and weather conditions: UAVDT-Day, UAVDT_Night, and UAVDT_Foggy. The UAVDT_Day dataset contains 20,892 training images of daytime scenes, the UAVDT_Night dataset contains 11,490 validation images of nighttime scenes, and the UAVDT_Foggy dataset contains 5,180 validation images of foggy scenes. During model training, the UAVDT_Night dataset was used as the source domain, while the remaining subsets were used as the target domain.

[0142] The evaluation metrics used were AP50, AP75, and mean precision (AP), where AP50 and AP75 represent the average precision at IoU thresholds of 0.5 and 0.75, respectively, and AP represents the mean of all average precisions calculated at 0.05 intervals within the IoU threshold range of 0.5 to 0.95. Table 1 shows the performance comparison between the present invention and five baseline models on the UAVDT dataset. The DAST-Det model of the present invention significantly outperforms existing comparative models in domain-adaptive object detection tasks in the nighttime and foggy domains, especially exhibiting stronger domain-invariant feature preservation capabilities in extreme visual degradation scenarios (such as dense fog and low-light nighttime). Experimental results validate the model's comprehensive advantages in semi-supervised pseudo-label generation, cross-domain feature alignment, and multi-scale aerial image object detection.

[0143] Table 1

[0144]

[0145] This invention has many specific applications. The above description is only a preferred embodiment of this invention. It should be noted that for those skilled in the art, several improvements can be made without departing from the principle of this invention, and these improvements should also be considered within the scope of protection of this invention.

Claims

1. A passive domain adaptive target detection method for UAV aerial images, characterized in that, The steps are as follows: 1) Input source and target domain images, apply strong and weak enhancement transformations to the target domain image, use the teacher model to generate a set of pseudo-labels based on the weakly enhanced image, and use the pseudo-label set to supervise the student model to detect and output the target domain bounding box based on the strongly enhanced image; 2) Construct a category confusion matrix and iteratively update it to identify the similarity between the majority and minority classes and estimate the bias between the classes; 3) Based on the category correlation information obtained from the category confusion matrix, match highly correlated minority and majority class instances, and perform hybrid enhancement processing on the corresponding instances; 4) Design a low-confidence pseudo-label distillation loss to constrain the target detection results; 5) Apply joint constraints to the target detection process to guide the model to gradually converge the detection results in the target domain image, and finally output the detection results of the target in the target domain image; Step 1) specifically includes: 11) For cross-domain target detection tasks using UAV aerial images, define a labeled dataset in the source domain. ,in Represents the source domain image. This represents the bounding box and category corresponding to the source domain image. , This represents the bounding box corresponding to the source domain image. Indicate the category corresponding to the source domain image; define the target domain as an unlabeled dataset. ; for target domain image Apply weak enhancement transformation Obtain weakly enhanced image For target domain image Strengthen enhancement transformation Obtain a strongly enhanced image ; 12) Using the teacher model for weakly enhanced images Perform forward reasoning to generate a set of pseudo-labels. ,as follows: ; in, For bounding box coordinates, For predicting categories, The confidence score is... This is the set of learnable parameters for the teacher model backbone network, feature fusion module, and detection head. The number of pseudo-labels is used; the student model uses the pseudo-label set output by the teacher model as a supervision signal to detect the target domain image, and finally outputs the bounding box of the predicted target. and their corresponding categories The formula is: ; in, The set of learnable parameters for the student model backbone network, feature fusion module, and detection head; the set of parameters for the teacher model. Through the student model parameter set The exponential moving average (EMA) is used for updating, and the formula is: ; in, The decay rate is used to control the update momentum; the pseudo-labels generated by the teacher model are combined with the predicted output of the student model to calculate the unsupervised loss.

2. The passive domain adaptive target detection method for UAV aerial images according to claim 1, characterized in that, Step 2) specifically includes: 21) Construct a class confusion matrix M between the true class and the predicted class based on labeled image samples from the source and target domains during training, where the true class... Predicted category derived from manually labeled real categories The specific formula is derived from the forward propagation inference of the model: ; 22) Each real category The corresponding row vectors are normalized to obtain the class conditional probability distribution. It represents the true category. Predicted as category The estimated probabilities are as follows: ; in, Indicates the total number of categories. For the first One category; 23) The categorical confusion matrix is ​​updated iteratively using the exponential moving average (EMA). The update rules are as follows: ; in, This is the smoothing coefficient.

3. The passive domain adaptive target detection method for UAV aerial images according to claim 2, characterized in that, Step 3) specifically includes: 31) Based on the category confusion matrix The acquired category associations identify pairings between easily confused majority and minority categories. Instances of highly similar, easily confused categories are then enhanced using a blending method. An instance library is constructed to store target image instances cropped from labeled images. During enhancement, base instances and blended instances are selected from this library and fused pixel-wise using the MixUp method at a ratio of λ to generate enhanced samples. and its category vector The details are as follows: ; ; in, and These represent the cropped images of the base instance and the blended instance, respectively. and These represent the categories of basic instances and hybrid instances, respectively. A class vector representing an augmented instance; 32) In the source domain, instances of both the source and target domains are used simultaneously for mixing, and real labels are used to assist domain adaptation; in the target domain, instances of the target domain are used first, and instances of the source domain are introduced only when there are insufficient samples of a specific category, where the specific category samples are those with few samples or those with high confusion in category prediction, to ensure that the augmentation process focuses on the target domain representation.

4. The passive domain adaptive target detection method for UAV aerial images according to claim 3, characterized in that, Step 4) specifically includes: 41) The pseudo-label set generated in step 1) is processed by removing background scores, confidence filtering, and non-maximum suppression. After filtering, a set of pseudo-labels is obtained. Then, by analyzing the set of pseudo-tags... Apply a high confidence threshold Extract the set of high-confidence pseudo-labels ,as follows: ; in, The set of pseudo-labels obtained after preprocessing The Middle In the target domain image, the first An example of a pseudo-tag. For the first In the target domain image, the first The predicted class confidence vector corresponding to each pseudo-label instance; 42) Set a threshold for the intersection-union ratio (IoU) between all pseudo-labels and the set of high-confidence pseudo-labels. Based on this threshold, a set of pseudo-labels that does not significantly overlap with the set of high-confidence pseudo-labels is selected. The specific formula is as follows: ; in, A collection of pseudo-tags The Middle In the target domain image, the first An example of a pseudo-label; A set of high-confidence pseudo-labels The Middle In the target domain image, the first An example of a pseudo-label; By setting a background confidence threshold Remove the pseudo-tags identified as background to obtain the filtered set of pseudo-tags. The specific formula is as follows: ; in, For the first In the target domain image, the first Background category confidence of each pseudo-label instance; Optimize the filtered pseudo-labels, remove background scores, and convert the foreground score vectors. Divide by its L1 norm to amplify the score differences between foreground categories, the formula is: ; in, The class confidence of the optimized foreground score vector; Apply a low confidence threshold Obtain a set of low-confidence pseudo-labels The specific formula is as follows: ; 43) Designing low-confidence pseudo-label distillation loss Using Kullback-Leibler divergence loss, the probability distribution of class predictions of the student model in the region corresponding to the low-confidence pseudo-label is calculated. With magnified category distribution The difference between them, the specific formula for the low-confidence pseudo-label distillation loss function is: ; in, Kullback-Leibler divergence is used to measure the distribution of predicted class probabilities. Between and magnified category distribution The difference, as a distillation loss, guides the two to remain consistent in the probability space.

5. The passive domain adaptive target detection method for UAV aerial images according to claim 1, characterized in that, Step 5) specifically includes: By introducing a composite loss function Joint constraints are applied to the object detection process to guide the student model to gradually converge its detection results on the object domain image, ultimately outputting the detection results of the object in the object domain image. This is the total loss function, used to guide the updating of all network parameters; This indicates that by leveraging high-confidence pseudo-labels, the model's detection capability is improved through high-confidence predictions of easy positive samples; low-confidence pseudo-labels distillation loss. It is used to guide student models to focus on learning from difficult positive samples.

6. A passive domain adaptive target detection system for UAV aerial imagery, characterized in that, include: The detection module takes source and target domain images as input, applies strong and weak enhancement transformations to the target domain image, and uses a teacher model to generate a set of pseudo-labels based on the weakly enhanced image. and with a set of pseudo-tags The supervised student model detects target domain bounding boxes based on strongly enhanced images; The class bias estimation module is used to construct the class confusion matrix. Iterative updates are performed to identify the similarity between the majority and minority classes and estimate the bias between categories; The hybrid enhancement module is used to match highly correlated minority and majority class instances based on the class correlation information modeled by the class confusion matrix, and perform hybrid enhancement processing on the corresponding instances to improve the generalization detection capability of minority class targets. The constraint module is used to design the low-confidence pseudo-label distillation loss and constrain the target detection results; The target detection output module is used to apply joint constraints to the target detection process, so as to guide the model to gradually converge the detection results on the target domain image, and finally output the detection results of the target in the target domain image.