Double-stream video classification method and device based on cross-mode attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video classification and attention technology, applied in the field of computer vision, can solve the problems of difficult to quickly and accurately locate key objects, "moving objects" without modeling methods, and less research, so as to improve video classification accuracy, stabilize video classification accuracy, The effect of good compatibility

Active Publication Date: 2019-08-30

PEKING UNIV +2

View PDF5 Cites 25 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Second, the current technology is still difficult to quickly and accurately locate key objects

As for how to capture key clues, that is, to introduce the attention mechanism into video classification, there are relatively few studies. The more representative one is the non-local neural network (Non-local Neural Networks), but the network can only focus on a single Important information inside the modal, there is no special way to model "moving objects"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0036] 1. Configuration of cross-modal attention module

[0037] The cross-modal attention module can handle input of any dimension, and can ensure that the shape of the input and output is consistent, so it has excellent compatibility. Taking the 2-dimensional configuration as an example, Q, K, and V are respectively obtained by 1x1 2-dimensional convolution operation (for 3-dimensional models, the convolution here is 1x1x1 3-dimensional convolution operation), in order to reduce computational complexity and save In GPU space, the above convolution operation performs dimensionality reduction in the channel dimension while obtaining Q, K, and V. In order to further simplify the operation, a max-pooling operation can be performed before the convolu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a double-stream video classification method and device based on a cross-modal attention mechanism. The method is different from a traditional double-flow method. Information of two modes (even more modes) is fused before a prediction result. Therefore, the method is more efficient and sufficient. Meanwhile, information interaction is carried out in the earlier stage, a single branch has important information of another branch in the later stage, the precision of the single branch is flush with or even exceeds that of a traditional double-flow method, and the parameterquantity of the single branch is much less than that of the traditional double-flow method. Compared with a non-local neural network, the attention module designed by the invention can be in a cross-mode state instead of only using an attention mechanism in a single mode, and the effect of the method provided by the invention is equivalent to that of the non-local neural network under the condition that the two modes are the same.

Description

technical field [0001] The invention relates to a video classification method, in particular to a dual-stream video classification method and device using an attention mechanism, belonging to the field of computer vision. [0002] technical background [0003] With the rapid development of deep learning in the image field, deep learning methods have gradually been introduced in the video field and have achieved certain achievements. However, the current technical level is far from reaching the desired effect, and the problems faced mainly include the following two aspects: [0004] First, current technology has yet to take full advantage of dynamic information. The difference between video and image is that the dynamic information between frames is unique and very important to video. For example, even for humans, it is difficult to judge various sub-categories of dances (such as tango and salsa) only by looking at one frame of images, and if the motion trajectory informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/75G06F16/73G06K9/62G06N3/04

CPCG06N3/045G06F18/214

Inventor 迟禄严慧田贵宇穆亚东陈刚王成成黄波韩峻糜俊青

Owner PEKING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Double-stream video classification method and device based on cross-mode attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology