A computer implemented method computer implemented method for deriving a
fingerprint from video data is disclosed, comprising the steps of receiving a plurality of frames from the video data; selecting at least one
key frame from the plurality of frames, the at least one
key frame being selected from two consecutive frames of the plurality of frames that exhibiting a maximal cumulative difference in at least one spatial feature of the two consecutive frames; detecting at least one 3D spatio-temporal feature within the at least one
key frame; and encoding a spatio-temporal
fingerprint based on mean luminance of the at least one 3D spatio-temporal feature. The least one spatial feature can be intensity. The at least one 3D spatio-temporal feature can be at least one Maximally Stable Volume (MSV). Also disclosed is a method for matching video data to a
database containing a plurality of video fingerprints of the type described above, comprising the steps of calculating at least one
fingerprint representing at least one query frame from the video data; indexing into the
database using the at least one calculated fingerprint to find a set of candidate fingerprints; applying a
score to each of the candidate fingerprints; selecting a subset of candidate fingerprints as proposed frames by rank ordering the candidate fingerprints; and attempting to match at least one fingerprint of at least one proposed
frame based on a comparison of gradient-based descriptors associated with the at least one query frame and the at least one proposed frame.