The invention discloses a visual simultaneous localization and mapping (Visual-SLAM) method based on a depth convolution auto-encoder. The method comprises the steps of 1, performing data preprocessing on training data; 2, establishing a multi-task learning network; 3, taking three adjacent frames of binocular images in the image sequence as network input; 4, constructing a loss function; 5, training, verifying and testing the multi-task network; 6, the trained shared encoder network is used for loopback detection; 7, constructing a new Visual-SLAM system front end through the six steps, constructing a rear end of the Visual-SLAM system through pose graph optimization or factor graph optimization, and building a complete system, and 8, verifying the positioning accuracy and robustness. A depth convolution auto-encoder is used, a semi-supervised multi-task learning method is used to construct the front end of an SLAM system, depth estimation, camera pose estimation, optical flow estimation and semantic segmentation are included, and a feature map of a network is used to construct image representation to perform loop detection.