A two-stream video classification method and device based on cross-modal attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video classification and attention technology, applied in the field of computer vision, can solve problems such as difficult to quickly and accurately locate key objects, "moving objects" without modeling methods, and less research, to achieve improved video classification accuracy, improved classification accuracy, and better The effect of compatibility

Active Publication Date: 2021-06-22

PEKING UNIV +2

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Second, the current technology is still difficult to quickly and accurately locate key objects

As for how to capture key clues, that is, to introduce the attention mechanism into video classification, there are relatively few studies. The more representative one is the non-local neural network (Non-local Neural Networks), but the network can only focus on a single Important information inside the modal, there is no special way to model "moving objects"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0036] 1. Configuration of cross-modal attention module

[0037] The cross-modal attention module can handle input of any dimension, and can ensure that the shape of the input and output is consistent, so it has excellent compatibility. Taking the 2-dimensional configuration as an example, Q, K, and V are respectively obtained by 1x1 2-dimensional convolution operation (for 3-dimensional models, the convolution here is 1x1x1 3-dimensional convolution operation), in order to reduce computational complexity and save In GPU space, the above convolution operation performs dimensionality reduction in the channel dimension while obtaining Q, K, and V. In order to further simplify the operation, a max-pooling operation can be performed before the convolu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a dual-stream video classification method and device based on a cross-modal attention mechanism. Different from the traditional two-stream method, the present invention fuses the information of two modalities (or even more modalities) before predicting the result, so it can be more efficient and sufficient. At the same time, due to the information interaction at an earlier stage , a single branch already has the important information of another branch in the later stage, the accuracy of the single branch has been equal to or even exceeded the traditional two-stream method, and the parameter amount of the single branch is much less than the traditional two-stream method; compared with the non-local neural network, this The attention module designed by the invention can cross modalities, instead of only using the attention mechanism within a single modality. The method proposed by the invention is equivalent to a non-local neural network when the two modalities are the same.

Description

technical field [0001] The invention relates to a video classification method, in particular to a dual-stream video classification method and device using an attention mechanism, belonging to the field of computer vision. [0002] technical background [0003] With the rapid development of deep learning in the image field, deep learning methods have gradually been introduced in the video field and have achieved certain achievements. However, the current technical level is far from reaching the desired effect, and the problems faced mainly include the following two aspects: [0004] First, current technology has yet to take full advantage of dynamic information. The difference between video and image is that the dynamic information between frames is unique and very important to video. For example, even for humans, it is difficult to judge various sub-categories of dances (such as tango and salsa) only by looking at one frame of images, and if the motion trajectory informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/75G06F16/73G06K9/62G06N3/04

CPCG06N3/045G06F18/214

Inventor 迟禄严慧田贵宇穆亚东陈刚王成成黄波韩峻糜俊青

Owner PEKING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A two-stream video classification method and device based on cross-modal attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology