The invention relates to a visual target retrieval method and system based on target detection. The method comprises the steps that an IDF weighted cross entropy loss function is adopted to train a public target detection dataset, and a preliminary target detection model is generated; a retrieval dataset containing a target type designated by a user is adopted to slightly adjust the preliminary target detection model, and a final target detection model is generated; and feature extraction is performed on a visual target in a to-be-retrieved picture through the final target detection model, multiple convolution feature graphs of the to-be-retrieved picture are generated, the convolution feature graphs are aggregated through a spatial attention matrix, aggregate feature vectors are generated, and a picture matched with the aggregate feature vectors is retrieved in a picture library. According to the method, visual target retrieval and detection are associated, so that a candidate window prediction step is avoided; and the attention matrix is obtained by selectively accumulating the feature graphs, local descriptors of a convolution layer are aggregated into a global feature expression in a weighted mode, the global feature expression is used for visual target retrieval, and retrieval speed and precision are improved.