Visual target tracking method based on multi-level aggregation and attention twin network
A twin network and target tracking technology, applied in the field of visual target tracking, can solve the problems of missing details and local structure information, it is difficult to distinguish between objects with the same attributes or semantics, low resolution, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0051] refer to figure 1 The schematic diagram is a schematic diagram of the overall framework of the multi-level aggregation and attention Siamese network proposed in this embodiment. Most of the existing trackers rely on the output features of the last layer of the Siamese backbone network to track the target, and often ignore the characteristics of different levels of features. Therefore, this embodiment proposes a new network called Siam Multi-Level Aggregation and Attention Network (SiamMLAA), which includes a head attention (HA) module, a multi-layer aggregation (MLA) module and a self-refinement (SR) module. The simple process can be described as a head-attention module added to the top-level convolutional layer of the backbone network to improve feature representation, and to model a wider and richer context for top-level features by utilizing spatial and channel attention; in addition, the multi-layer aggregation module It can effectively integrate low-level spatial...
Embodiment 2
[0116] In order to verify the real effect of the visual object tracking method based on multi-level aggregation and attention twin network proposed in the above embodiment, the experimental results of this embodiment on five public tracking benchmark data sets including OTB2013, OTB50, OTB2015, VOT2016 and VOT2017 show that , this method is superior to the baseline tracker in various evaluation criteria, and also has high competitiveness in the existing tracking methods, so the proposed network SiamMLAA has achieved very good performance in all aspects.
[0117] Specifically, the proposed network framework is implemented on PyTorch and trained on RTX2080Ti with 4 GPUs.
[0118] The training process is as follows: use the ResNet22 model pre-trained on the ILSVRC classification dataset and random noise to initialize the backbone network and the rest, and train offline on the target tracking dataset GOT10K. The dataset contains more than 10,000 video clips of moving targets in ...
Embodiment 3
[0128] In order to verify the effectiveness of each key module designed in the proposed tracker, an ablation study is also carried out in this embodiment. The ablation experiment is carried out on the OTB benchmark, which includes three data sets of OTB2013, OTB50 and OTB2015, from Figure 8 with Figure 9 It can be found intuitively that in Figure 8 with Figure 9 Among them, a is the schematic diagram of the success graph, and b is the schematic diagram of the accuracy graph. The tracker containing all the modules (i.e. multi-layer aggregation MLA module, self-refinement SR module and head-attention HA module) achieves almost the best tracking performance in terms of both accuracy and success rate, which demonstrates that the proposed tracker of the present invention Each of the modules in is necessary to significantly improve the final tracking performance.
[0129] Table 2: Ablation studies of different composition combinations on the OTB dataset.
[0130]
[0131...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


