Method of using multi-layer attention network mechanism to solve video question answering

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention and video technology, applied in computer parts, special data processing applications, instruments, etc., can solve problems such as lack of temporal dynamic information modeling

Active Publication Date: 2018-03-06

ZHEJIANG UNIV

View PDF2 Cites 48 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of temporal dynamic information in the video in the prior art, and for the video often contains the shape of the object and its movement information, and The video information related to the question is scattered in some target frames of the video, the present invention provides a method of using a multi-layer attention network to generate an answer to the video-related question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0129] The present invention conducts experimental verification on the data set built by itself, and constructs a total of two data sets, namely the YouTube2Text data set and the VideoClip data set. The YouTube2Text data set contains 1987 video clips and 122708 text descriptions, and the VideoClip data set contains 201068 video clips and 287933 text descriptions. The present invention generates corresponding question answer pairs for the text descriptions in the two data sets. For the YouTube2Text data set, the present invention generates four question answer pairs, which are respectively related to the object, number, location, and person of the video; for the VideoClip data set , The present invention generates four question answer pairs, which are respectively related to the object, number, color, and location of the video. Subsequently, the present invention performs the following preprocessing on the constructed video question and answer data set:

[0130] 1) Take 60 frames...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method of utilizing a multi-layer attention network mechanism to solve video question answering. The method mainly includes the following steps: 1) for a group of videos, utilizing a pre-trained convolutional neural network to obtain frame-level and segment-level video expressions; 2) using a question-word-level attention network mechanism to obtain frame-level and segment-level video expressions for a question word level; 3) using a question-level time attention mechanism to obtain frame-level and segment-level video expressions related to a question; 4) utilizing aquestion-level fusion attention network mechanism to obtain joint video expressions related to the question; and 5) utilizing the obtained joint video expressions to acquire answers to the question asked for the videos. Compared with general video question answering solution, the method utilizes the multi-layer attention mechanism, and can more accurately reflect video and question characteristics, and generate the more conforming answers. Compared with traditional methods, the method achieves a better effect in video question answering.

Description

Technical field [0001] The present invention relates to video question and answer answer generation, and more particularly to a method for generating answers to video-related questions using a multi-layer attention network. Background technique [0002] Video question and answer questions are an important question in the field of video information retrieval. The goal of the question is to automatically generate answers for related videos and corresponding questions. [0003] The existing technology mainly solves question and answer questions related to static images. Although the current technology is aimed at static image question answering and can achieve good performance results, such a method lacks the modeling of time dynamic information in the video, so it cannot be well extended to video question answering tasks. [0004] In view of the situation that the video often contains the shape and movement information of the object, and the video information related to the problem is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06K9/00G06K9/62

CPCG06F16/334G06F16/783G06V20/46G06F18/214

Inventor 赵洲孟令涛林靖豪姜兴华蔡登何晓飞庄越挺

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method of using multi-layer attention network mechanism to solve video question answering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology