The invention discloses a
crowd counting and positioning method based on a coding and decoding structure, relates to the field of
computer vision, and solves the problems that in the prior art, characteristics are not fully utilized, and a tag graph cannot well give consideration to counting and positioning tasks. A multi-scale
feature fusion module is introduced into the deep layer of the network, a space-channel attention up-sampling module is introduced into a re-decoding part, and the multi-scale
feature fusion module captures features of multiple scales by using cavity
convolution with different expansion rates and performs
feature fusion, so that the robustness of the network to deal with scale change is improved; the space-channel attention up-sampling module guides superficial layer features to perform high-efficiency fusion through high-level high-level
semantics, so that interference of redundant features and picture backgrounds is reduced; secondly, a new
label graph is provided, and the
label graph has the
advantage of simple counting of a density graph and also has the positioning performance of an FIDT graph.