Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video and question technology, applied in the field of video question answering and answer generation, can solve problems such as lack of temporal dynamic information modeling

Active Publication Date: 2020-04-14

ZHEJIANG UNIV

View PDF4 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of temporal dynamic information in the video in the prior art, and for the video often contains the shape of the object and its movement information, and The video information related to the question is scattered in some target frames of the video. The present invention provides a method for solving video question-answering questions using a multiple interactive network mechanism based on graph theory. The specific technical solution adopted by the present invention is :

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0170] The present invention is verified experimentally on the well-known data sets TGIF-QA, MSVD-QA and MSRVTT-QA. Table 1-Table 3 are the results of training and testing on the three data sets in this embodiment.

[0171] Table 1: Statistics of samples in the TGIF-QA dataset

[0172]

[0173] Table 2: Statistics of samples in the MSVD-QA dataset

[0174]

[0175] Table 3: Statistics of samples in the MSRVTT-QA dataset

[0176]

[0177] In order to objectively evaluate the performance of the algorithm of the present invention, the present invention adopts different evaluation mechanisms for different types of problems. For state transitions, repeated behaviors, single-frame image question answering, classification accuracy (ACC) is used to measure accuracy; for repeated counts, the mean squared error (MSE) between the correct answer and the predicted answer is used.

[0178] The final experimental results are shown in Table 4-Table 6:

[0179] Table 4: Comparison ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

PUM

Login to view more

Abstract

The invention discloses a method for solving a video question and answer problem by utilizing a graph theory-based multi-interaction network mechanism. The method comprises the following steps: 1) obtaining a frame-level video expression by utilizing a ResNet network for a video; 2) utilizing a MaskR-CNN network to obtain existence and position characteristics of the object; 3) extracting problemword-level information by using a GloVe network; 4) using a GNN network composition based on a graph theory, introducing a message mechanism to iterate the graph, and finally obtaining object existence and inter-object connection feature expression; 5) introducing multiple interactions, and learning by using a feedforward neural network to obtain feature expressions of object existence and inter-object dynamic relationship related to the problem and video frame-level and fragment-level expressions; and 6) using different strategies for different types of question and answer modules. Accordingto the method, the space-time dependence relation and the dynamic semantic interaction information between the objects are obtained through the mechanism, the deeper effect is achieved in video understanding, and then more accurate answers are given.

Description

technical field [0001] The invention relates to the generation of video question-and-answer answers, in particular to a method for solving video question-and-answer questions by using a graph theory-based multi-interaction network mechanism. Background technique [0002] Video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] Existing techniques mainly address question-answering questions related to static images. Although the current technology can achieve good performance results for static image question answering, such methods lack the modeling of temporal dynamic information in videos, so they cannot be well extended to video question answering tasks. [0004] For the situation that the video often contains the shape of the object and its movement information, and the video information related to the problem is sc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

Application Information

Patent Timeline

Login to view more

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/732G06F16/783G06F16/787G06N3/04G06N3/08

CPCG06F16/7335G06F16/7343G06F16/7844G06F16/7837G06F16/787G06N3/08G06N3/045

Inventor 赵洲卢航顾茅陈默沙

Owner ZHEJIANG UNIV

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Try Eureka

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.

Method for solving video question and answer problem by utilizing graph theory-based multi-interaction network mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology