Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism

A codec and attention technology, applied in the fields of instruments, computer parts, special data processing applications, etc., can solve problems such as the inability to make good use of the mutual sequence relationship between videos

Active Publication Date: 2017-12-12
ZHEJIANG UNIV
View PDF4 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the deficiency that the mutual sequence relationship between the frames in the video cannot be well utilized in the prior art, the present invention provides a coding and decoding method using layered spatiotemporal attention. A method to generate relevant answers to open-ended video questions using a network of detectors
[0006] A layered spatio-temporal attention codec network is used to solve open-ended video question answering problems, including the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism
  • Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism
  • Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0077] The present invention conducts experimental verification on the data set constructed by itself. This data set contains 201,068 GIF fragments and 287,933 text descriptions, and then the present invention generates question and answer pairs from the video description. The verification experiment of the present invention comprises 4 kinds of problems in total, which are respectively related to the object, number, color and location of the video. Then the present invention carries out following pretreatment to the video question answering data set of construction:

[0078] 1) Take 25 frames for each video, and reset each frame to a size of 224×224, and then use VGGNet to obtain the 4096-dimensional feature expression of each frame. For each frame, the present invention selects 3 regions as candidate regions.

[0079] 2) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expressions of questions and answer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for solving video questions and answers by use of a hierarchical space-time attention coder-decoder network mechanism. The method mainly comprises the steps that (1) a space-time attention coding neural network is trained according to a video, question and answer training set, and coexpression of videos and questions is learned; and (2) a decoding neural network is trained according to output of the neural network which finishes coding and obtains video question coexpression together with relevant answers, and the decoding neural network is used for outputting corresponding natural language answers according to coexpression of the videos and the questions. Compared with ordinary video question and answer solutions, a time attention mechanism is utilized, a sequential relation between video frames is better utilized, meanwhile, a space attention mechanism is utilized to make key locations in the video frames precise, the characteristics of the videos and the questions can be reflected more accurately, and the answers better meeting requirements are generated. The effect of the method in terms of video question and answer problems is better than that of a traditional method.

Description

technical field [0001] The present invention relates to video question answering text generation, and more particularly to a method for generating answers to video-related questions using a layered spatio-temporal attention codec network. Background technique [0002] Open-ended video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] The existing technology mainly solves the question and answer questions related to static images. For the question and answer questions related to video, the method adopted is to simply shrink the video into an image question and answer question in time, and use the image question and answer method to solve the video question. Q&A questions. Although it can achieve good performance results for static image question answering, such a method cannot make good use of the sequence relationship ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/783G06F18/214
Inventor 赵洲孟令涛杨启凡肖俊吴飞庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products