Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network

An attention and progressive technology, applied in video data retrieval, biological neural network model, metadata video data retrieval, etc., can solve problems such as insufficient answers and lack of information

Inactive Publication Date: 2020-01-17
ZHEJIANG UNIV
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this way is a rough representation of the visual content and lacks more detailed information such as objects in the frame
This would render this approach insufficient for answering questions that depend on the details of the video content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network
  • Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network
  • Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0073] This embodiment constructs a video question answering dataset from the YouTubeClips video dataset, which contains 1,987 videos and 122,708 natural language descriptions collected from the YouTube website. Since the YouTubeClips video dataset contains rich natural language descriptions, the present invention generates questions and related answers according to an automatic question generation method. In this embodiment, the question-answer pairs generated in the YouTube-QA dataset are divided into five categories {"what", "who", "how", "where", "other"} according to the answer attributes. Details about the dataset are summarized below.

[0074] This example discards videos for which the question cannot be generated from the description. Therefore, the YouTube-QA dataset finally contains 1,970 videos, along with 122,708 natural language descriptions and 50,505 question-answer pairs. In this embodiment, the data set is divided into three parts: training set, verification...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an annotation text corresponding to the video object set in an external knowledge base to obtain external knowledge; extracting semantic features of external knowledge by using Doc2Vec to obtain a knowledge feature set of the video; aiming at the problem, converting an input word into a word embedding vector by using an embedding layer (embedding layer); inputting the word embedding vector into a progressive space-time attention network to generate an answer; by using the additional information, more specific questions, such as some common questions, can be answered; external knowledge and questions are combined, progressive video attention is guided in space and time dimensions, and fine-grained joint video representation is learned to perform answer prediction.

Description

technical field [0001] The invention relates to the field of video question answering answer generation, in particular to a method for solving video question answering tasks requiring common sense using a question-knowledge guided progressive spatiotemporal attention network. Background technique [0002] Visual Question Answering (VQA) is a task for bridging computer vision (CV) and natural language processing (NLP), which automatically returns accurate answers from reference visual content according to a user's question. According to the type of visual content, there are mainly two kinds of visual question answering, one is image question answering, and the other is video question answering. In recent years, a lot of work has been done in the field of visual question answering. However, most existing work focuses on static image question answering. [0003] Video question answering is a nascent field in which far less work has been done by researchers than image question...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/78G06N3/04G06K9/00
CPCG06F16/3329G06F16/78G06N3/049G06V20/41G06V20/46G06N3/045
Inventor 赵洲张品涵金韦克陈默沙
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products