Video classification based on hybrid convolution and attention mechanism

A video classification and attention technology, applied in the field of image processing, can solve the problems of large amount of calculation in end-to-end training dual-stream network, difficult to determine the local position of video frames, and no great improvement, etc., to reduce computational complexity, The effect of reducing model complexity and improving accuracy

Active Publication Date: 2019-02-26
XIDIAN UNIV
View PDF9 Cites 84 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The shortcomings of this method are that the local position of the video frame is not easy to determine when initializing the feature extraction, and the end-to-end training of the dual-stream network requires a large amount of calculation and poor real-time performance.
The di...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video classification based on hybrid convolution and attention mechanism
  • Video classification based on hybrid convolution and attention mechanism
  • Video classification based on hybrid convolution and attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] With the popularity of short videos, people's research has moved from the field of images to the field of video, and there is also a great demand for video classification. The shortcomings of existing technologies in solving video classification problems lie in low accuracy and poor real-time performance. For this reason, the present invention proposes a video classification method based on hybrid convolution and attention mechanism through research and innovation, see figure 1 , the present invention utilizes the spatial information and time information corresponding to the video to carry out spatio-temporal feature extraction and adopts an end-to-end strategy to classify the video, including the following steps:

[0034] (1) Select a video classification dataset: first select the corresponding dataset for the video to be classified and input it. For example, when classifying human action videos, input the human action video dataset, and all the input datasets are used ...

Embodiment 2

[0047] The video classification method based on mixed convolution and attention mechanism is the same as embodiment 1, and obtains the video mixed convolution feature map described in the step (5) of the present invention in the direction of the time series dimension, including the following steps:

[0048] (5a) Obtain the hybrid convolutional feature maps of two video clips: input the preprocessed two video clips into the constructed hybrid convolutional neural network, and obtain the final result of the two input video clips on the hybrid convolutional neural network. 2048 5×5 pixel feature maps output by a convolutional layer conv.

[0049] (5b) Merge the mixed convolution feature maps of the two video clips in the direction of the time sequence dimension to obtain the video mixed convolution feature map: 2048 5×5 pixel convolution feature maps of the two input video clips in the time sequence Combined in the dimension direction, 2048 mixed convolution feature maps of 5×5 p...

Embodiment 3

[0052] The video classification method based on mixed convolution and attention mechanism is the same as embodiment 1-2, and obtains video attention feature map with attention mechanism operation described in the step (6) of the present invention, carries out as follows:

[0053] (6a) The shape of the obtained video hybrid convolution feature map is expressed as 2048×2×5×5, where 2048 is the number of channels, 2 is the timing length, and two 5s are the height and width of the video hybrid convolution feature map, respectively.

[0054] (6b) The video mixed convolution feature map is expanded into 2048 feature vectors, and the feature vector dimension is 2×5×5=50, forming a feature vector matrix with a size of 2048×50.

[0055] (6c) Calculate the eigenvector matrix F 1 and F 2 The inner product of is carried out according to the following formula:

[0056]

[0057] Among them, the eigenvector matrix F 1 is the original matrix, the eigenvector matrix F 2 for F 1 The tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video classification method based on a mixed convolution and attention mechanism, which solves the problems of complex calculation and low accuracy of the prior art. The method comprises the following steps of: selecting a video classification data set; Segmented sampling of input video; Preprocessing two video segments; Constructing hybrid convolution neural network model; The video mixed convolution feature map is obtained in the direction of temporal dimension. Video Attention Feature Map Obtained by Attention Mechanism Operation; Obtaining a video attention descriptor; Training the end-to-end entire video classification model; Test Video to be Categorized. The invention directly obtains mixed convolution characteristic maps for different video segments, Compared with the method of obtaining optical flow features, the method of obtaining optical flow features reduces the computational burden and improves the speed, introduces the attention mechanism betweendifferent video segments, describes the relationship between different video segments and improves the accuracy and robustness, and is used for video retrieval, video tagging, human-computer interaction, behavior recognition, event detection and anomaly detection.

Description

technical field [0001] The invention belongs to the technical field of image processing, and further relates to video classification based on deep learning, specifically a video classification method based on hybrid convolution and attention mechanism, which can be used for video retrieval, video labeling, human-computer interaction, behavior recognition, Multiple practical tasks such as event detection and anomaly detection. Background technique [0002] Video classification has always been a hot topic in the field of image and video. In recent years, people are still very interested in the application of video classification. With the popularity of short videos, major platforms have higher and higher requirements for the accuracy of video retrieval and video tags. They aim to recommend videos that users are interested in through intelligent classification, save users' time and cost, and grasp the ability of information flow. The essence of human-computer interaction and b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V20/41G06V20/46G06N3/045G06F18/24
Inventor 韩红张照宇李阳陈军如高鑫磊岳欣
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products