Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method of using multi-layer attention network mechanism to solve video question answering

An attention and video technology, applied in computer parts, special data processing applications, instruments, etc., can solve problems such as lack of temporal dynamic information modeling

Active Publication Date: 2018-03-06
ZHEJIANG UNIV
View PDF2 Cites 48 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of temporal dynamic information in the video in the prior art, and for the video often contains the shape of the object and its movement information, and The video information related to the question is scattered in some target frames of the video, the present invention provides a method of using a multi-layer attention network to generate an answer to the video-related question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of using multi-layer attention network mechanism to solve video question answering
  • Method of using multi-layer attention network mechanism to solve video question answering
  • Method of using multi-layer attention network mechanism to solve video question answering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0129] The present invention conducts experimental verification on the data set built by itself, and constructs a total of two data sets, namely the YouTube2Text data set and the VideoClip data set. The YouTube2Text data set contains 1987 video clips and 122708 text descriptions, and the VideoClip data set contains 201068 video clips and 287933 text descriptions. The present invention generates corresponding question answer pairs for the text descriptions in the two data sets. For the YouTube2Text data set, the present invention generates four question answer pairs, which are respectively related to the object, number, location, and person of the video; for the VideoClip data set , The present invention generates four question answer pairs, which are respectively related to the object, number, color, and location of the video. Subsequently, the present invention performs the following preprocessing on the constructed video question and answer data set:

[0130] 1) Take 60 frames...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method of utilizing a multi-layer attention network mechanism to solve video question answering. The method mainly includes the following steps: 1) for a group of videos, utilizing a pre-trained convolutional neural network to obtain frame-level and segment-level video expressions; 2) using a question-word-level attention network mechanism to obtain frame-level and segment-level video expressions for a question word level; 3) using a question-level time attention mechanism to obtain frame-level and segment-level video expressions related to a question; 4) utilizing aquestion-level fusion attention network mechanism to obtain joint video expressions related to the question; and 5) utilizing the obtained joint video expressions to acquire answers to the question asked for the videos. Compared with general video question answering solution, the method utilizes the multi-layer attention mechanism, and can more accurately reflect video and question characteristics, and generate the more conforming answers. Compared with traditional methods, the method achieves a better effect in video question answering.

Description

Technical field [0001] The present invention relates to video question and answer answer generation, and more particularly to a method for generating answers to video-related questions using a multi-layer attention network. Background technique [0002] Video question and answer questions are an important question in the field of video information retrieval. The goal of the question is to automatically generate answers for related videos and corresponding questions. [0003] The existing technology mainly solves question and answer questions related to static images. Although the current technology is aimed at static image question answering and can achieve good performance results, such a method lacks the modeling of time dynamic information in the video, so it cannot be well extended to video question answering tasks. [0004] In view of the situation that the video often contains the shape and movement information of the object, and the video information related to the problem is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/00G06K9/62
CPCG06F16/334G06F16/783G06V20/46G06F18/214
Inventor 赵洲孟令涛林靖豪姜兴华蔡登何晓飞庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products