Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Double-stream video classification method and device based on cross-mode attention mechanism

A video classification and attention technology, applied in the field of computer vision, can solve the problems of difficult to quickly and accurately locate key objects, "moving objects" without modeling methods, and less research, so as to improve video classification accuracy, stabilize video classification accuracy, The effect of good compatibility

Active Publication Date: 2019-08-30
PEKING UNIV +2
View PDF5 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Second, the current technology is still difficult to quickly and accurately locate key objects
As for how to capture key clues, that is, to introduce the attention mechanism into video classification, there are relatively few studies. The more representative one is the non-local neural network (Non-local Neural Networks), but the network can only focus on a single Important information inside the modal, there is no special way to model "moving objects"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Double-stream video classification method and device based on cross-mode attention mechanism
  • Double-stream video classification method and device based on cross-mode attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0036] 1. Configuration of cross-modal attention module

[0037] The cross-modal attention module can handle input of any dimension, and can ensure that the shape of the input and output is consistent, so it has excellent compatibility. Taking the 2-dimensional configuration as an example, Q, K, and V are respectively obtained by 1x1 2-dimensional convolution operation (for 3-dimensional models, the convolution here is 1x1x1 3-dimensional convolution operation), in order to reduce computational complexity and save In GPU space, the above convolution operation performs dimensionality reduction in the channel dimension while obtaining Q, K, and V. In order to further simplify the operation, a max-pooling operation can be performed before the convolu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a double-stream video classification method and device based on a cross-modal attention mechanism. The method is different from a traditional double-flow method. Information of two modes (even more modes) is fused before a prediction result. Therefore, the method is more efficient and sufficient. Meanwhile, information interaction is carried out in the earlier stage, a single branch has important information of another branch in the later stage, the precision of the single branch is flush with or even exceeds that of a traditional double-flow method, and the parameterquantity of the single branch is much less than that of the traditional double-flow method. Compared with a non-local neural network, the attention module designed by the invention can be in a cross-mode state instead of only using an attention mechanism in a single mode, and the effect of the method provided by the invention is equivalent to that of the non-local neural network under the condition that the two modes are the same.

Description

technical field [0001] The invention relates to a video classification method, in particular to a dual-stream video classification method and device using an attention mechanism, belonging to the field of computer vision. [0002] technical background [0003] With the rapid development of deep learning in the image field, deep learning methods have gradually been introduced in the video field and have achieved certain achievements. However, the current technical level is far from reaching the desired effect, and the problems faced mainly include the following two aspects: [0004] First, current technology has yet to take full advantage of dynamic information. The difference between video and image is that the dynamic information between frames is unique and very important to video. For example, even for humans, it is difficult to judge various sub-categories of dances (such as tango and salsa) only by looking at one frame of images, and if the motion trajectory informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/75G06F16/73G06K9/62G06N3/04
CPCG06N3/045G06F18/214
Inventor 迟禄严慧田贵宇穆亚东陈刚王成成黄波韩峻糜俊青
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products