Reverse attention model based on multi-scale depth supervision
An attention model and attention technology, applied in the field of pedestrian re-identification, can solve problems such as weakening of attention, loss of feature information, increase in network model complexity, etc., and achieve the effect of improving timeliness and advanced performance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0045] The structural diagram of a reverse attention model based on multi-scale depth supervision proposed by the present invention is as follows: figure 1 As shown, the ResNet-50 network pre-trained on the ImageNet dataset is used as the backbone framework to extract different levels of deep features from pedestrian pictures. Remove the last spatial downsampling operation, the original global average pooling operation, and the fully connected layer of the ResNet-50 network, and then re-add the average pooling layer and the linear classification layer at the end of the network. The intermediate layer features generated by the four stages of the ResNet-50 network are used as the input of the attention mechanism module and the reverse attention mechanism module. The proposed multi-scale feature learning layer as figure 2 As shown, in order to reduce the amount of GPU memory occupied by the training network, only the outputs of the second and third stages are selected to partic...
Embodiment 2
[0096] In order to verify the effectiveness of the model proposed in this application, this embodiment conducts relevant experimental verification on three large-scale public pedestrian re-identification datasets: Market-1501, CUHK03, and DukeMTMC-reID. The experimental parameter settings and experimental results of the application will be described in detail below.
[0097] Experiment details:
[0098] The network model proposed in this application is implemented on the PyTorch framework. All experiments are carried out on two TITAN XP graphics cards, and the dimensionality reduction ratio parameter r in the attention mechanism module is set to 16. The size of all training pictures is set to 384×128 pixels, and the training data set is expanded with random erasure and random horizontal flip. The batch data block size of each training is set to 64, which contains 16 different pedestrians, and each pedestrian contains 4 pedestrian pictures. Loss function weight coefficient λ ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


