The invention discloses a cross-modal retrieval method based on modal specificity and shared feature learning, which comprises the following steps: S1, acquiring a cross-modal retrieval data set, and dividing the cross-modal retrieval data set into a training set and a test set; S2, respectively carrying out feature extraction on the text and the image; S3, extracting modal specific features and modal sharing features; S4, generating a hash code corresponding to the modal sample through a hash network; S5, training the network by combining the loss function of the adversarial auto-encoder network and the loss function of the Hash network; and S6, performing cross-modal retrieval on the samples in the test set by using the network trained in the step S5. According to the method, a Hash network is designed, encoding features of image channels, encoding features of text channels and modal sharing features are projected into a Hamming space, and modeling is performed by using label information, modal specificity and sharing features, so that output Hash codes have better semantic discrimination between modals and in the modals.