The present invention discloses a cross-scene pedestrian searching method based on depth learning. The method comprises a step of carrying out preprocessing on each image in a sample library, a step of constructing and training a convolutional neural network, a step of extracting an upper half body local feature vector set and a lower half body local feature vector set from two groups of preprocessed image sets, and then the two local feature vector sets are fused to obtain a global feature vector, a step of carrying out preprocessing on an image to be searched, extracting an upper half body local feature vector and a lower half body local feature vector and fusing the two vectors to obtain a global feature vector, a step of orderly comparing the global feature vector corresponding to the image to be searched and the global feature vectors corresponding to the sample library images through a cosine similarity, outputting a group of similarity values, and sorting the similarity values according to a sorting algorithm. The method has the advantages that with the pedestrian images obtained in a monitoring video as the sample library, the design of features is not needed, the robustness is high, and the accuracy rate of actual searching is high.