The invention provides a method for detecting and counting dense crowd distribution in a video. Firstly, acquiring a large number of videos containing 
crowds with different densities to construct a 
data set; then constructing a deep neural network of multi-scale 
feature fusion and an attention mechanism, inputting the 
training set into the network, outputting prediction results of a corresponding 
crowd density map and an attention map, constructing a 
loss function model in combination with the real density map and the attention map for training, and generating an optimized network; obtaining a density map of a crowd 
video image through optimized multi-scale 
feature fusion and deep neural network prediction of an attention mechanism, furthering performing point clustering on the estimated density map by using a grid-based hierarchical density space clustering method to identify a group, and obtaining the number of people and position information of the group quickly. According to the invention, the problems of 
perspective distortion, scale change and 
background noise influence of the camera can be solved, and the counting precision and stability are improved; and meanwhile, the crowd is divided into groups, so that the distribution condition of the crowd can be visually displayed.