The invention discloses a crowd counting and future pedestrian volume prediction method based on video images. The method comprises the steps: 1, selecting a video image data set with annotation information, conducting Gaussian function processing according to annotation of a head position, and generating a true value density map; 2, inputting a video frame into a built MPDC model to extract a feature map, and mapping the feature map into a crowd estimation density map (DE); and 3, inputting obtained DE stacking frames into a constructed Bi-ConvLSTM network, predicting a crowd prediction density map at a T+1 moment, and estimating the number of pedestrians at the T+1 moment. According to the method, a convolutional network based on a multi-scale pyramid cavity and a Bi-ConvLSTM network based on residual connection are adopted, a crowd estimation density map is generated by using continuous video frames, a crowd prediction density map of a future frame is further predicted, and the number of crowds is counted. The method aims at the prediction of continuous video images, is a brand-new method, can obtain a real-time crowd density map and the number of people, and also can predict the crowd density map and the pedestrian volume of a future frame.