The invention relates to a public scene intelligent video monitoring method based on vision saliency and depth self-coding. The method includes performing single-frame decomposition on a video, using visual saliency to extract motion information, then calculating optical flow of a moving object in adjacent frames, dividing a later detection process into a training process and a testing process, during training, using optical flow of a training sample as input of self-coding, training a whole self-coding network through minimization of a loss function, in the test stage, using optical flow of the training sample and a test sample as input, extracting a coder in the trained self-coding network, extracting features of the input through dimensionality reduction, then visualizing a result after dimensionality reduction, utilizing a suprasphere to represent a visual range of the training sample, when the test sample is input, utilizing the same method to realize visualization, and if a sample visualization result falls in a suprasphere range, judging the sample to be normal; and otherwise, if the sample visualization result falls beyond the suprasphere range, judging the sample to be abnormal, thereby realizing intelligent monitoring of the video.