The invention relates to the technical field of
computer vision, in particular to a weak supervision
time sequence action detection method based on space-
time correlation learning, which comprises the following steps: S1, extracting features from video frames through an I3D network; s2, constructing a dynamic space graph
network structure for the video to obtain video space features; s3, constructing a one-dimensional
time sequence convolutional network to obtain video
time sequence features; s4, fusing the time sequence features and the spatial features; s5, using action-background attention mechanisms, namely action attention and background attention, which are respectively used for
pooling original video features; s6, predicting a class activation sequence of space-
time correlation of actions and backgrounds in the video, predicting an action activation sequence or a background activation sequence in the video, and respectively obtaining three classification losses; s7, calculating a total
loss function; and S8, using the trained model for action detection. According to the method, the problem that the action example is incomplete and inaccurate in the existing weak supervision time sequence action detection method is solved.