The invention discloses a weak supervision fine-grained image recognition method based on a visual self-attention mechanism. The method involves a student-model module, a teacher-model module and a classification-model module. The student-model and the teacher-model are combined through a Teacher-Student loop feedback mechanism based on Pairwise Approach sorting learning so as to form a self-attention region recommendation network, so that the relation between discriminative region positioning and fine-grained feature learning is enhanced, the discriminative region in the fine-grained image can still be accurately detected under the condition of lacking a target bounding box and a part marking point, and the recognition accuracy is promoted to be remarkably improved; meanwhile, a convolution layer is shared by the three modules, namely, the student-model, the teacher-model and the class-model, so that the model storage space is effectively compressed, the calculation cost is reduced, the method meets the real-time recognition task requirement, and the method is suitable for a large-scale real scene; and besides, a dynamic weight distribution mechanism is adopted in multi-task jointlearning to reduce the amount of artificially set hyper-parameters and enhance the robustness of the model, finally, the whole model is trained and learned in an end-to-end single-stage mode, and thenetwork optimization difficulty is reduced.