The invention provides a method for detecting and counting dense crowd distribution in a video. Firstly, acquiring a large number of videos containing
crowds with different densities to construct a
data set; then constructing a deep neural network of multi-scale
feature fusion and an attention mechanism, inputting the
training set into the network, outputting prediction results of a corresponding
crowd density map and an attention map, constructing a
loss function model in combination with the real density map and the attention map for training, and generating an optimized network; obtaining a density map of a crowd
video image through optimized multi-scale
feature fusion and deep neural network prediction of an attention mechanism, furthering performing point clustering on the estimated density map by using a grid-based hierarchical density space clustering method to identify a group, and obtaining the number of people and position information of the group quickly. According to the invention, the problems of
perspective distortion, scale change and
background noise influence of the camera can be solved, and the counting precision and stability are improved; and meanwhile, the crowd is divided into groups, so that the distribution condition of the crowd can be visually displayed.