The invention belongs to computer vision pattern recognition and video image processing methods, including: establishing a three-dimensional space-time sub-frame cube in the model of establishing video human behavior recognition, establishing human behavior feature space, clustering processing, updating labels, and establishing video human behavior recognition In the recognition of models and human behaviors, extract three-dimensional space-time sub-frame cubes from surveillance videos, extract human behavior features, determine the categories of human sub-behaviors in each video, and classify and merge videos with sub-category labels; and the current international understanding of Hollywood2 human behavior Compared with the highest recognition accuracy of the database, it has increased by 16.5%. Therefore, the invention has the ability to automatically extract more discriminative, adaptive, universal and invariant human behavior characteristics, reduces the over-fitting phenomenon and gradient diffusion problems in the neural network, and effectively improves human behavior in complex environments. The accuracy of recognition can be widely used in on-site video surveillance and video content retrieval.