Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism

A video and question technology, applied in the field of video question answering and answer generation, can solve problems such as lack of temporal dynamic information modeling

Active Publication Date: 2020-04-14
ZHEJIANG UNIV
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of temporal dynamic information in the video in the prior art, and for the video often contains the shape of the object and its movement information, and The video information related to the question is scattered in some target frames of the video. The present invention provides a method for solving video question-answering questions using a multiple interactive network mechanism based on graph theory. The specific technical solution adopted by the present invention is :

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism
  • Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism
  • Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0170] The present invention is verified experimentally on the well-known data sets TGIF-QA, MSVD-QA and MSRVTT-QA. Table 1-Table 3 are the results of training and testing on the three data sets in this embodiment.

[0171] Table 1: Statistics of samples in the TGIF-QA dataset

[0172]

[0173] Table 2: Statistics of samples in the MSVD-QA dataset

[0174]

[0175] Table 3: Statistics of samples in the MSRVTT-QA dataset

[0176]

[0177] In order to objectively evaluate the performance of the algorithm of the present invention, the present invention adopts different evaluation mechanisms for different types of problems. For state transitions, repeated behaviors, single-frame image question answering, classification accuracy (ACC) is used to measure accuracy; for repeated counts, the mean squared error (MSE) between the correct answer and the predicted answer is used.

[0178] The final experimental results are shown in Table 4-Table 6:

[0179] Table 4: Comparison ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for solving a video question and answer problem by utilizing a graph theory-based multi-interaction network mechanism. The method comprises the following steps: 1) obtaining a frame-level video expression by utilizing a ResNet network for a video; 2) utilizing a MaskR-CNN network to obtain existence and position characteristics of the object; 3) extracting problemword-level information by using a GloVe network; 4) using a GNN network composition based on a graph theory, introducing a message mechanism to iterate the graph, and finally obtaining object existence and inter-object connection feature expression; 5) introducing multiple interactions, and learning by using a feedforward neural network to obtain feature expressions of object existence and inter-object dynamic relationship related to the problem and video frame-level and fragment-level expressions; and 6) using different strategies for different types of question and answer modules. Accordingto the method, the space-time dependence relation and the dynamic semantic interaction information between the objects are obtained through the mechanism, the deeper effect is achieved in video understanding, and then more accurate answers are given.

Description

technical field [0001] The invention relates to the generation of video question-and-answer answers, in particular to a method for solving video question-and-answer questions by using a graph theory-based multi-interaction network mechanism. Background technique [0002] Video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] Existing techniques mainly address question-answering questions related to static images. Although the current technology can achieve good performance results for static image question answering, such methods lack the modeling of temporal dynamic information in videos, so they cannot be well extended to video question answering tasks. [0004] For the situation that the video often contains the shape of the object and its movement information, and the video information related to the problem is sc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/732G06F16/783G06F16/787G06N3/04G06N3/08
CPCG06F16/7335G06F16/7343G06F16/7844G06F16/7837G06F16/787G06N3/08G06N3/045
Inventor 赵洲卢航顾茅陈默沙
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products