The invention discloses a view landmark retrieval method based on end-to-end deep learning, and the method comprises the following steps: S1, collecting a key landmark image, carrying out the preprocessing operation, and enabling the key landmark image to serve as training data; S2, embedding a local aggregation descriptor feature vector method into the CNN to form an end-to-end CNN model; S3, inputting the collected training data into an end-to-end CNN model, extracting image local invariant features, training the CNN model through an error function, and learning an optimal aggregation cluster center point; S4, performing key frame picture extraction operation on the to-be-identified video stream, and performing down-sampling operation after the to-be-identified video stream and the to-be-identified picture stream are subjected to the down-sampling operation to generate a to-be-identified landmark data set Q; S5, inputting Q into the trained CNN model, performing local invariant feature vector extraction, and outputting a calculation result of each landmark category through a full connection layer and a data output layer; And S6, according to a key landmark category threshold value set by training, judging whether each piece of data in Q has a key landmark category or not, and if yes, outputting a picture source name and landmark prompt.