Video classification based on hybrid convolution and attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video classification and attention technology, applied in the field of image processing, can solve the problems of large amount of calculation in end-to-end training dual-stream network, difficult to determine the local position of video frames, and no great improvement, etc., to reduce computational complexity, The effect of reducing model complexity and improving accuracy

Active Publication Date: 2019-02-26

XIDIAN UNIV

View PDF9 Cites 84 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The shortcomings of this method are that the local position of the video frame is not easy to determine when initializing the feature extraction, and the end-to-end training of the dual-stream network requires a large amount of calculation and poor real-time performance.

The disadvantage of this method is that simply using the residual network to deepen the model does not improve much, and the accuracy is low

[0008] To sum up, the shortcomings of existing technologies in solving video classification problems are low accuracy and poor real-time performance.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0033] With the popularity of short videos, people's research has moved from the field of images to the field of video, and there is also a great demand for video classification. The shortcomings of existing technologies in solving video classification problems lie in low accuracy and poor real-time performance. For this reason, the present invention proposes a video classification method based on hybrid convolution and attention mechanism through research and innovation, see figure 1 , the present invention utilizes the spatial information and time information corresponding to the video to carry out spatio-temporal feature extraction and adopts an end-to-end strategy to classify the video, including the following steps:

[0034] (1) Select a video classification dataset: first select the corresponding dataset for the video to be classified and input it. For example, when classifying human action videos, input the human action video dataset, and all the input datasets are used ...

Embodiment 2

[0047] The video classification method based on mixed convolution and attention mechanism is the same as embodiment 1, and obtains the video mixed convolution feature map described in the step (5) of the present invention in the direction of the time series dimension, including the following steps:

[0048] (5a) Obtain the hybrid convolutional feature maps of two video clips: input the preprocessed two video clips into the constructed hybrid convolutional neural network, and obtain the final result of the two input video clips on the hybrid convolutional neural network. 2048 5×5 pixel feature maps output by a convolutional layer conv.

[0049] (5b) Merge the mixed convolution feature maps of the two video clips in the direction of the time sequence dimension to obtain the video mixed convolution feature map: 2048 5×5 pixel convolution feature maps of the two input video clips in the time sequence Combined in the dimension direction, 2048 mixed convolution feature maps of 5×5 p...

Embodiment 3

[0052] The video classification method based on mixed convolution and attention mechanism is the same as embodiment 1-2, and obtains video attention feature map with attention mechanism operation described in the step (6) of the present invention, carries out as follows:

[0053] (6a) The shape of the obtained video hybrid convolution feature map is expressed as 2048×2×5×5, where 2048 is the number of channels, 2 is the timing length, and two 5s are the height and width of the video hybrid convolution feature map, respectively.

[0054] (6b) The video mixed convolution feature map is expanded into 2048 feature vectors, and the feature vector dimension is 2×5×5=50, forming a feature vector matrix with a size of 2048×50.

[0055] (6c) Calculate the eigenvector matrix F 1 and F 2 The inner product of is carried out according to the following formula:

[0056]

[0057] Among them, the eigenvector matrix F 1 is the original matrix, the eigenvector matrix F 2 for F 1 The tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video classification method based on a mixed convolution and attention mechanism, which solves the problems of complex calculation and low accuracy of the prior art. The method comprises the following steps of: selecting a video classification data set; Segmented sampling of input video; Preprocessing two video segments; Constructing hybrid convolution neural network model; The video mixed convolution feature map is obtained in the direction of temporal dimension. Video Attention Feature Map Obtained by Attention Mechanism Operation; Obtaining a video attention descriptor; Training the end-to-end entire video classification model; Test Video to be Categorized. The invention directly obtains mixed convolution characteristic maps for different video segments, Compared with the method of obtaining optical flow features, the method of obtaining optical flow features reduces the computational burden and improves the speed, introduces the attention mechanism betweendifferent video segments, describes the relationship between different video segments and improves the accuracy and robustness, and is used for video retrieval, video tagging, human-computer interaction, behavior recognition, event detection and anomaly detection.

Description

technical field [0001] The invention belongs to the technical field of image processing, and further relates to video classification based on deep learning, specifically a video classification method based on hybrid convolution and attention mechanism, which can be used for video retrieval, video labeling, human-computer interaction, behavior recognition, Multiple practical tasks such as event detection and anomaly detection. Background technique [0002] Video classification has always been a hot topic in the field of image and video. In recent years, people are still very interested in the application of video classification. With the popularity of short videos, major platforms have higher and higher requirements for the accuracy of video retrieval and video tags. They aim to recommend videos that users are interested in through intelligent classification, save users' time and cost, and grasp the ability of information flow. The essence of human-computer interaction and b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06V20/41G06V20/46G06N3/045G06F18/24

Inventor 韩红张照宇李阳陈军如高鑫磊岳欣

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video classification based on hybrid convolution and attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology