The invention provides a video editing method, system and device based on scene recognition and a storage medium. The method comprises the steps of: extracting each frame of an original video as a first picture, generating a first picture set, arranging the first pictures in the first picture set according to the sequence of the pictures in the original video, and forming a frame chain table; cutting the pictures in the first picture set to remove markers to obtain second pictures, and generating a second picture set; sequentially adding a lens label to each second picture in the second picture set according to a lens recognition model, and adding a scene label; editing the second picture set according to a preset target frame number, the target scene label and a preset scene priority sequence, and sequentially outputting third pictures to obtain a third picture set; and synthesizing all the third pictures in the third picture set according to the sequence in the frame chain table, andoutputting the edited video. According to the invention, videos can be automatically clipped in batches, the work of artificial video synthesis is replaced, the operation cost is greatly saved, and the operation efficiency is effectively improved.