Video question answering method based on object-oriented double-flow attention network

An object-oriented, attention technology, applied in neural learning methods, biological neural network models, digital video signal modification, etc., can solve problems such as video question answering tasks that cannot be image question answering, and achieve the effect of improving exploration ability.

Pending Publication Date: 2022-05-03
HANGZHOU DIANZI UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This patented technology uses two different ways: one allows users to see what they are doing with videos while another analyzes how things behave when interacting within them. By analyzing these properties at once, it provides better understanding about interactions among multiple sources like backgrounds or scenes. It also helps improve upon existing models by providing more accurate representations of real world events such as human actions or movements. Overall, this method enhances learning from data collected through experiments conducted over large amounts of digital media.

Problems solved by technology

In order to solve problem addressed by these systems related to video query answers (QA), they require understanding how well things like backgrounds or other parts of the scene look up from frame data alone while still taking into account their relationships during QA analysis. However, current models have limitations when trying to capture complex behaviors involving multiple modes of interest.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video question answering method based on object-oriented double-flow attention network
  • Video question answering method based on object-oriented double-flow attention network
  • Video question answering method based on object-oriented double-flow attention network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0118] Such as figure 1 , figure 2 As shown, an object-oriented two-stream attention network based video question answering method, the steps are as follows:

[0119] Step (1), carry out data preprocessing to input data, for a section of video of input, at first adopt the mode of average sampling to sample video frame, in the present invention, the sampling number of every section video is T=10 frames. After that, the Faster-RCNN target detection algorithm is used to generate target objects on each frame, and multiple candidate frames are obtained. In addition, a convolutional network is used to extract static appearance features and dynamic behavior features for each video frame. In the present invention, the ResNet-152 network trained on the ImageNet image library is used to extract static appearance features, and the I3D network trained on the Kinetics action recognition data set is used to extract dynamic behavior features of video frames. Finally, the RoIAlign method ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video question answering method based on an object-oriented double-flow attention network. Visual content of a video is represented using a double stream mechanism, one stream being a static appearance stream of a foreground object and the other stream being a dynamic behavior stream of the foreground object. In each stream, the features of the object include the features of the object itself, and also include the space-time coding of the object and the context information features of the scene where the object is located. The relative space-time relation and the context sensing relation between the objects can be explored when deep feature extraction is carried out in subsequent image convolution operation. Meanwhile, a double-flow mechanism is used for solving the problem that a previous video question-answer model only considers static characteristics of an object and lacks dynamic information analysis. According to the method, the exploration capability on intra-modal interaction and inter-modal semantic alignment is improved, and a better result is obtained on a related video question and answer data set.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products