The invention belongs to the technical field of neural networks, and particularly relates to a
crowd counting method based on multi-scale
feature fusion. The method mainly comprises the following steps: extracting feature maps of three scales from a
backbone network, sending the feature maps into a
feature fusion sub-network, and calculating a density map by using the fused feature maps so as to predict the number of
crowds in the image, wherein the
feature fusion sub-network is designed into three
convolution network branches, each
branch is identical in structure, adopts an attention fusionnetwork and is divided into two paths, each path is composed of a
convolution layer, a normalization layer and an
activation function, and the two paths are identical in input and different in outputchannel number and are a single channel and an
N channel respectively; a single-channel
branch learns the feature weight of a multi-channel output
branch, the feature weight is multiplied by the output of a multi-channel output feature map, finally, the feature maps of three large branches are superposed, the feature maps are sent to a decoding module together to output an
image density map, and the integral value of the density map is the number of people in the image. According to the invention, the people counting precision is improved.