A video moving target tracking method and device based on two-way twin network
A twin network and moving target technology, applied in biological neural network models, neural learning methods, image analysis, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] The overall design idea of the algorithm is as follows: figure 2 as shown, figure 2 Among them, 1 indicates the z-response map of the last layer, 2 indicates the x-response map of the last layer, and * indicates cross-correlation. This framework contains two sub-network branches, semantic network branch and appearance network branch. Among them, the improved version of CIRes22 is used for the semantic branch network, and the network structure is shown in Table 1; the standard AlexNet is used for the appearance branch network. On the output feature map of the last layer of the template image z of the semantic branch network, a channel attention module is embedded; on the output feature map of the last layer of the template image z of the appearance branch network, an adaptive spatial masking strategy is added. Finally, the APCE value of the response map is output by each branch, and the weighted average is performed to obtain the final response map. The position c...
Embodiment 2
[0074] On the basis of Example 1, the single-way twin network uses the same network to extract the semantic features output by the last convolutional layer of the target, ignoring the appearance information of the target. However, the appearance information of the target is also important for the recognition of the target. effect. Therefore, this embodiment designs a tracker based on a multi-way Siamese network, using two different networks to extract the appearance information and semantic information of the target respectively. Specifically, the improved version of CIRes22 is used as the extractor of target semantic features, and AlexNet is used as the extractor of target appearance information. CIRes22 is an improved version based on ResNet. Compared with AlexNet with only 5 layers, the feature semantic discrimination ability extracted by the last layer is significantly better than AlexNet. Therefore, using CIRes22 and AlexNet to extract the semantic and appearance informa...
Embodiment 3
[0081] On the basis of the first embodiment, this embodiment starts with the two parameters of padding and stride in the convolutional network, and modifies the initial network according to these two aspects.
[0082] For padding, it may cause positional deviation during model training. Specifically, when the target moves to the edge of the image, if the network contains a padding operation, then the features extracted by the network will contain the original target part and the filling part of the edge, but for the candidate area in the search image, part of it is It only includes the target itself, and a part is the package target + filling these two parts. Therefore, this leads to inconsistency between the template image and the search area, and thus the final output response cannot truly reflect the similarity of the input image pair. Fortunately, when the object is close to the center of the image, the padding will not have a bad effect. In order to solve the interferen...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


