A multi-
feature fusion behavior identification method based on a
key frame comprises the following steps of firstly, extracting a
joint point feature vector x (i) of a
human body in a video through anopenpose
human body posture extraction
library to form a sequence S = {x (1), x (2),..., x (N)}; secondly, using a K-means
algorithm to obtain K final clustering centers c '= {c' | i = 1, 2,..., K},extracting a frame closest to each clustering center as a
key frame of the video, and obtaining a
key frame sequence F = {Fii | i = 1, 2,..., K}; and then obtaining the RGB information,
optical flow information and skeleton information of the key frame,
processing the information, and then inputting the processed information into a double-flow convolutional
network model to obtain the higher-levelfeature expression of the RGB information and the
optical flow information, and inputting the skeleton information into a space-time diagram convolutional
network model to construct the space-time diagram expression features of the skeleton; and then fusing the softmax output results of the network to obtain a final identification result. According to the process, the influences, such as the timeconsumption, accuracy reduction, etc., caused by redundant frames can be well avoided, and then the information in the video can be better utilized to express the behaviors, so that the recognition accuracy is further improved.