Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A codec and attention technology, applied in the fields of instruments, computer parts, special data processing applications, etc., can solve problems such as the inability to make good use of the mutual sequence relationship between videos

Active Publication Date: 2017-12-12

ZHEJIANG UNIV

View PDF4 Cites 37 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the deficiency that the mutual sequence relationship between the frames in the video cannot be well utilized in the prior art, the present invention provides a coding and decoding method using layered spatiotemporal attention. A method to generate relevant answers to open-ended video questions using a network of detectors

[0006] A layered spatio-temporal attention codec network is used to solve open-ended video question answering problems, including the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0077] The present invention conducts experimental verification on the data set constructed by itself. This data set contains 201,068 GIF fragments and 287,933 text descriptions, and then the present invention generates question and answer pairs from the video description. The verification experiment of the present invention comprises 4 kinds of problems in total, which are respectively related to the object, number, color and location of the video. Then the present invention carries out following pretreatment to the video question answering data set of construction:

[0078] 1) Take 25 frames for each video, and reset each frame to a size of 224×224, and then use VGGNet to obtain the 4096-dimensional feature expression of each frame. For each frame, the present invention selects 3 regions as candidate regions.

[0079] 2) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expressions of questions and answer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for solving video questions and answers by use of a hierarchical space-time attention coder-decoder network mechanism. The method mainly comprises the steps that (1) a space-time attention coding neural network is trained according to a video, question and answer training set, and coexpression of videos and questions is learned; and (2) a decoding neural network is trained according to output of the neural network which finishes coding and obtains video question coexpression together with relevant answers, and the decoding neural network is used for outputting corresponding natural language answers according to coexpression of the videos and the questions. Compared with ordinary video question and answer solutions, a time attention mechanism is utilized, a sequential relation between video frames is better utilized, meanwhile, a space attention mechanism is utilized to make key locations in the video frames precise, the characteristics of the videos and the questions can be reflected more accurately, and the answers better meeting requirements are generated. The effect of the method in terms of video question and answer problems is better than that of a traditional method.

Description

technical field [0001] The present invention relates to video question answering text generation, and more particularly to a method for generating answers to video-related questions using a layered spatio-temporal attention codec network. Background technique [0002] Open-ended video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] The existing technology mainly solves the question and answer questions related to static images. For the question and answer questions related to video, the method adopted is to simply shrink the video into an image question and answer question in time, and use the image question and answer method to solve the video question. Q&A questions. Although it can achieve good performance results for static image question answering, such a method cannot make good use of the sequence relationship ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06K9/62

CPCG06F16/783G06F18/214

Inventor 赵洲孟令涛杨启凡肖俊吴飞庄越挺

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for solving video questions and answers by use of hierarchical space-time attention coder-decoder network mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology