The invention discloses a video key frame extraction method, system and device based on multi-view features, and the method comprises the steps: setting a sampling rate to sample an original video stream, and extracting the video stream into a plurality of frames of images; calculating a Hamming distance of hash values of every two continuous adjacent frames of images for all the extracted framesby applying an average hash method AHA, if the Hamming distance is greater than a threshold value, determining that a shot boundary is formed, and otherwise, not dividing the shot boundary; respectively extracting three characteristic values, namely an RGB characteristic value, an HSV characteristic value and an LBP characteristic value, from each frame of image extracted in the sampling step; performing single-core clustering calculation on the extracted RGB, HSV and LBP feature values in each lens according to a lens division result in the lens division step, performing summation operation after normalization processing on clustering results, and taking a frame with a minimum summation result as a key frame of the lens. The extracted key frame is more representative, the robustness of the algorithm is enhanced, and the readability of video abstract extraction is improved.