Video action detection method, system and equipment and storage medium
An action detection and video technology, applied in the field of video analysis, can solve the problems of no interaction modeling, no emphasis on local correlation, no consideration of the inner correlation of action classes, etc., to solve multi-label problems, low computing cost, and robustness Effect of improving stickiness and discrimination
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0040] In order to solve the deficiencies of the prior art, an embodiment of the present invention provides a video action detection method, which is a video action detection method that integrates interaction relationship and category association. This method specifically models the spatial interaction, short-term temporal interaction, and long-term temporal interaction between action performers to enhance the expressive ability of target features and improve the recognition effect of interactive actions. This method considers both the spatial dimension and the temporal dimension The heterogeneity of the algorithm takes into account the local information and global information in time series. For multi-label problems, a category relationship module is designed to mine the dependencies between different action classes, and use this relationship to fuse the original category representations, making the learned representations more robust and discriminative, and further improving...
Embodiment 2
[0101] The present invention also provides a video motion detection system, which is mainly implemented based on the method provided in the first embodiment, as Image 6 As shown, the system mainly includes:
[0102] A video data acquisition module, configured to acquire video clips and determine key frames in the video clips;
[0103] The feature extraction network part, whose input is a video clip, is used to obtain the regional features corresponding to all the detection frames of the key frame through target detection and feature extraction;
[0104] The short-term interaction module, whose input is the regional features corresponding to all the detection frames of the key frame, is used to model the interaction in the spatial dimension and the temporal dimension respectively to obtain enhanced features;
[0105] The long-term interaction module, whose input is the regional features corresponding to all the detection frames of the key frame and the enhanced features, is u...
Embodiment 3
[0111] The present invention also provides a processing device, such as Figure 7 As shown, it mainly includes: one or more processors; memory for storing one or more programs; wherein, when the one or more programs are executed by the one or more processors, the One or more processors implement the methods provided in the foregoing embodiments.
[0112] Further, the processing device further includes at least one input device and at least one output device; in the processing device, the processor, memory, input device, and output device are connected through a bus.
[0113] In the embodiment of the present invention, the specific types of the memory, input device and output device are not limited; for example:
[0114] The input device can be a touch screen, an image acquisition device, a physical button or a mouse, etc.;
[0115] The output device can be a display terminal;
[0116] The memory may be random access memory (Random Access Memory, RAM), or non-volatile memory...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com