The invention provides a monocular vision inertial combination positioning navigation method. The method comprises the following steps: acquiring a video stream and an IMU data stream, and packaging and aligning the video stream and the IMU data stream; initializing the video stream and the IMU data stream, wherein the initializing process comprises the following steps: initializing vision, initializing an IMU, determining a conversion relationship between a vision coordinate system and an IMU world coordinate system, carrying out nonlinear optimization to determine a scale initial value, andcarrying out refinement estimation on the scale by using lambda I EKF; acquiring inertial navigation data in the IMU data stream, obtaining an IMU pose through a complementary filtering and pre-integration combined technology, tracking image characteristics in the video stream, and obtaining a vision pose by referring to the IMU pose variation; and determining whether the vision tracking is lostor not, carrying out motion tracking by using the IMU pose if the vision tracking is lost, and fusing the vision pose and the IMU pose through an IDSF technology if the vision tracking is not lost toobtain a final camera pose. The method aims at the disadvantages in the prior art, improves the scale precision, and achieves the effects of high precision and high robustness in the positioning process.