Unlock instant, AI-driven research and patent intelligence for your innovation.

A video classification method based on spatiotemporal attention

A video classification and attention technology, applied in instrument, calculation, character and pattern recognition, etc., can solve the problems of ignoring, not making full use, and limiting the effect of video classification, so as to improve the accuracy, improve the effect, and promote the learning effect. Effect

Active Publication Date: 2020-10-09
PEKING UNIV
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing deep video classification methods cannot simultaneously model spatial and temporal saliency in videos, ignore the connection between these two saliencies, and thus cannot make full use of both saliency to learn more effective video features , which limits the effect of video classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A video classification method based on spatiotemporal attention
  • A video classification method based on spatiotemporal attention
  • A video classification method based on spatiotemporal attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0021] A kind of video classification method based on spatio-temporal attention of the present invention, its flow process is as follows figure 1 As shown, it specifically includes the following steps:

[0022] (1) Data preprocessing

[0023] Data preprocessing is to extract frames and optical flow from the training video and the video to be predicted. The optical flow is the motion vector generated from two consecutive frames of the video, which can be decomposed into horizontal and vertical components. In order to facilitate the deep network to process the motion information in the optical flow, this embodiment alternately stacks the horizontal and vertical components of L consecutive optical flows to obtain an image with 2L channels.

[0024] (2) Spatio-temporal attention model construction and training

[0025] The spatio-tempora...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a video classification method based on spatio-temporal attention, comprising the following steps: extracting frames and optical flows from training videos and videos to be predicted, stacking several optical flows into multi-channel images; building a spatio-temporal attention model, including spatial attention Force network, temporal attention network and connection network; joint training of the three components in the spatiotemporal attention model improves the effect of spatial and temporal attention at the same time, and obtains the ability to accurately model spatial and temporal salience and apply Spatio-temporal attention model for video classification; use the learned spatio-temporal attention model to extract the spatial and temporal saliency of the video frame and optical flow to be predicted and predict, and fuse the prediction scores of the frame and optical flow to obtain the final semantic category of the video to be predicted . The present invention can simultaneously model spatial domain and temporal domain attention, and make full use of the collaborative performance of the two through joint training to learn more accurate spatial domain and temporal domain saliency, thereby improving the accuracy of video classification.

Description

technical field [0001] The invention relates to the technical field of video classification, in particular to a video classification method based on spatio-temporal attention. Background technique [0002] With the widespread popularization and rapid development of social media and self-media, the number of videos on the Internet is showing a trend of rapid growth. Research shows that more than 300 hours of videos were uploaded to YouTube every minute in 2016. The 2016 video traffic statistics and forecast report of CISCO in the United States further pointed out that global video traffic will account for 82% of Internet traffic in 2020. At that time, it will take a user five million years to watch all the videos transmitted on the Internet within one month. video. Video and other media data have become the main body of big data. How to accurately analyze and identify video content is of great significance for meeting users' information acquisition needs. [0003] Video cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00
CPCG06V20/41G06V20/46
Inventor 彭宇新张俊超
Owner PEKING UNIV