The invention discloses a weak supervision video sequential action positioning method and system based on deep learning, and the method comprises the following steps: S1, extracting a current frame and a previous frame in a video, extracting an optical flow through an optical flow estimation network, and inputting the optical flow and frames sampled by the video at equal intervals into a double-flow action recognition network to extract video features; S2, performing semantic consistency modeling on the video features to obtain embedded features; S3, mapping the embedded features to a class activation sequence by a training classification module; S4, updating the video features by adopting an attention module; S5, taking the updated video features as the input of the next cycle, and repeating S2 to S4 until stopping; S6, fusing class activation sequences generated by each cycle, and calculating classification loss of estimated action classes and real class labels; S7, fusing the embedded features of each cycle to calculate similarity loss between the action features; and S8, obtaining target loss according to the classification loss and the similarity loss, and updating system model parameters.