Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism

A convolutional network, multi-granularity technology, used in video data retrieval, neural learning methods, biological neural network models, etc., can solve problems such as lack of contextual information modeling, and achieve faster computing speed, time efficiency, and high accuracy. Effect

Inactive Publication Date: 2020-04-10
ZHEJIANG UNIV
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of contextual information in the video in the prior art, and for the video often contains the appearance of the object and its movement information, and with the problem The relevant video information is scattered in some target frames of the video. The present invention provides a method for generating answers to video-related questions using a multi-granularity convolutional self-attention context network.
Questions and answers in video dialogue often contain contextual information, the present invention uses multi-granularity convolutional network self-attention context network to obtain context-aware question joint video expression

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism
  • Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism
  • Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0116] The present invention is verified experimentally on a data set produced by a professional crowdsourcing labeling company. A total of two data sets are used, namely the YouTubeClips data set and the TACoS-MultiLevel data set. The YouTubeClips data set contains 1987 video clips and 66806 questions and answers For each video is 60 frames, the TACoS-MultiLevel dataset contains 1303 video clips and 37228 question-answer pairs and each video is 80 frames. Then the present invention carries out following pretreatment to the video question answering data set of construction:

[0117] 1) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expression of questions and answers. In particular, the word set contains 6500 words, and the dimension of the word vector is 100 dimensions.

[0118] 2) For the videos of the YouTubeClips dataset and the TACoS-MultiLevel dataset, reset each frame to a size of 224×224, and use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for solving video question-answer problems by using a multi-granularity convolutional network self-attention context network mechanism. The method mainly comprises thefollowing steps that: 1) for a group of videos, frame-level and segment-level video expressions are obtained by using a pre-trained VGG network and a 3D-Conv network; 2) for question word embedding and answer word embedding of dialogue histories and new questions, the multi-granularity convolution self-attention network mechanism and a sentence-level context attention mechanism are adopted to obtain joint expressions related to the questions; and 3) a question-level time attention mechanism and a fused attention network mechanism are adopted to obtain joint video expressions related to the questions and generate answers to the questions asked about the videos. Compared with a common video question-answer solution, the method is advantageous in that the multi-granularity convolutional self-attention network is utilized, so that visible information and dialogue historical information can be combined to generate answers which better meet requirements. The method achieves a better effectin the video question-answer problems compared with a traditional method.

Description

technical field [0001] The invention relates to the generation of answers to video questions and answers, in particular to a method for solving video questions and answers by using a multi-granularity convolutional network self-attention context network mechanism. Background technique [0002] Video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] Existing technologies are mainly aimed at answering questions related to static image generation, and have achieved good results, but there are still great challenges in video question answering, such as contextual correlation between visible information and text information in video, And static images don't reflect this, ignoring a lot of contextual information. The present invention uses a self-attention mechanism to capture context information. Compared with the current R...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/75G06F16/738G06F16/783G06K9/62G06N3/04G06N3/08
CPCG06F16/75G06F16/783G06F16/738G06N3/08G06N3/047G06N3/045G06F18/241G06F18/2415
Inventor 赵洲李国昌金韦克
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products