Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention and progressive technology, applied in video data retrieval, biological neural network model, metadata video data retrieval, etc., can solve problems such as insufficient answers and lack of information

Inactive Publication Date: 2020-01-17

ZHEJIANG UNIV

View PDF5 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this way is a rough representation of the visual content and lacks more detailed information such as objects in the frame

This would render this approach insufficient for answering questions that depend on the details of the video content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0073] This embodiment constructs a video question answering dataset from the YouTubeClips video dataset, which contains 1,987 videos and 122,708 natural language descriptions collected from the YouTube website. Since the YouTubeClips video dataset contains rich natural language descriptions, the present invention generates questions and related answers according to an automatic question generation method. In this embodiment, the question-answer pairs generated in the YouTube-QA dataset are divided into five categories {"what", "who", "how", "where", "other"} according to the answer attributes. Details about the dataset are summarized below.

[0074] This example discards videos for which the question cannot be generated from the description. Therefore, the YouTube-QA dataset finally contains 1,970 videos, along with 122,708 natural language descriptions and 50,505 question-answer pairs. In this embodiment, the data set is divided into three parts: training set, verification...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an annotation text corresponding to the video object set in an external knowledge base to obtain external knowledge; extracting semantic features of external knowledge by using Doc2Vec to obtain a knowledge feature set of the video; aiming at the problem, converting an input word into a word embedding vector by using an embedding layer (embedding layer); inputting the word embedding vector into a progressive space-time attention network to generate an answer; by using the additional information, more specific questions, such as some common questions, can be answered; external knowledge and questions are combined, progressive video attention is guided in space and time dimensions, and fine-grained joint video representation is learned to perform answer prediction.

Description

technical field [0001] The invention relates to the field of video question answering answer generation, in particular to a method for solving video question answering tasks requiring common sense using a question-knowledge guided progressive spatiotemporal attention network. Background technique [0002] Visual Question Answering (VQA) is a task for bridging computer vision (CV) and natural language processing (NLP), which automatically returns accurate answers from reference visual content according to a user's question. According to the type of visual content, there are mainly two kinds of visual question answering, one is image question answering, and the other is video question answering. In recent years, a lot of work has been done in the field of visual question answering. However, most existing work focuses on static image question answering. [0003] Video question answering is a nascent field in which far less work has been done by researchers than image question...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/332G06F16/78G06N3/04G06K9/00

CPCG06F16/3329G06F16/78G06N3/049G06V20/41G06V20/46G06N3/045

Inventor 赵洲张品涵金韦克陈默沙

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology