Content-based clustering, recognition, classification and search of high volumes of multimedia data in real-time. The invention is dedicated to real-time fast generation of signatures to high-volume of multimedia content-segments, based on relevant audio and visual signals, and to scalable matching of signatures of high-volume database of content-segments' signatures. The invention can be implemented in any applications which involve large-scale content-based clustering, recognition and classification of multimedia data, such as, content-tracking, video filtering, multimedia taxonomy generation, video fingerprinting, speech-to-text, audio classification, object recognition, video search and any other application requiring content-based signatures generation and matching for large content volumes such as, web and other large-scale databases.