Double-stage time sequence action detection method and device, equipment and medium
An action detection, two-stage technology, applied in neural learning methods, character and pattern recognition, instruments, etc., can solve the problems of low recognition accuracy, low judgment accuracy, special requirements for the length of the video to be detected, etc., and achieve recognition stability High and robust effect
- Application Information
AI Technical Summary
Problems solved by technology
 During the public data set thumos-14 and the ActivityNet-1.3, the two-stage timing action detection is performed, followed by:
 S1, get video information characteristics
 S2, according to the video information characteristics, extract candidate boundaries, and obtain candidate boxes by the candidate boundary;
 S3, correct the candidate frame boundary, determine the action in the video.
 In step S1, the video is cut into the same N-segment segment as the length in order, extracts the RGB stream of all segments, and the RGB stream and the optical stream are input to the 3D action recognition model to extract RGB features and optical flow characteristics, and then fuse RGB features and optical flow characteristics, characterizes the characteristics of the entire video information, wherein each segment in the n segment segment is 16 frames.
 The following subsections are included in step S2:
 S21, convert video information charact...
 Comparative Example 1 The result of the candidate box in the ThumoS-14 data center, as shown in Table 1
 Table I
 Among them, @ 50, @ 100, @ 200 indicates that the average recall rate when each video is generated in 50, 100,200 candidate boxes. The higher the average recall rate, the better performance, which can be seen from the table, this application The recall rate in Example 1 was significantly higher than the recall of other means.
 Comparative Example 1 The result of the candidate box is generated in the ActivityNet-1.3 data set, as shown in Table 2.
 Table II
 Among them, AR @ AN = 100 means that the average recall rate when each video is generated in 100 candidate frames, the higher the average recall rate, the better the performance. AUC is Ar @ an = 100 curve and the area enclosed in the coordinate axis, the greater the value, the better performance. As can be seen from the table, the candidate frame i...
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction