Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional network, multi-granularity technology, used in video data retrieval, neural learning methods, biological neural network models, etc., can solve problems such as lack of contextual information modeling, and achieve faster computing speed, time efficiency, and high accuracy. Effect

Inactive Publication Date: 2020-04-10

ZHEJIANG UNIV

View PDF2 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to solve the problems in the prior art, in order to overcome the lack of modeling of contextual information in the video in the prior art, and for the video often contains the appearance of the object and its movement information, and with the problem The relevant video information is scattered in some target frames of the video. The present invention provides a method for generating answers to video-related questions using a multi-granularity convolutional self-attention context network.

Questions and answers in video dialogue often contain contextual information, the present invention uses multi-granularity convolutional network self-attention context network to obtain context-aware question joint video expression

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0116] The present invention is verified experimentally on a data set produced by a professional crowdsourcing labeling company. A total of two data sets are used, namely the YouTubeClips data set and the TACoS-MultiLevel data set. The YouTubeClips data set contains 1987 video clips and 66806 questions and answers For each video is 60 frames, the TACoS-MultiLevel dataset contains 1303 video clips and 37228 question-answer pairs and each video is 80 frames. Then the present invention carries out following pretreatment to the video question answering data set of construction:

[0117] 1) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expression of questions and answers. In particular, the word set contains 6500 words, and the dimension of the word vector is 100 dimensions.

[0118] 2) For the videos of the YouTubeClips dataset and the TACoS-MultiLevel dataset, reset each frame to a size of 224×224, and use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for solving video question-answer problems by using a multi-granularity convolutional network self-attention context network mechanism. The method mainly comprises thefollowing steps that: 1) for a group of videos, frame-level and segment-level video expressions are obtained by using a pre-trained VGG network and a 3D-Conv network; 2) for question word embedding and answer word embedding of dialogue histories and new questions, the multi-granularity convolution self-attention network mechanism and a sentence-level context attention mechanism are adopted to obtain joint expressions related to the questions; and 3) a question-level time attention mechanism and a fused attention network mechanism are adopted to obtain joint video expressions related to the questions and generate answers to the questions asked about the videos. Compared with a common video question-answer solution, the method is advantageous in that the multi-granularity convolutional self-attention network is utilized, so that visible information and dialogue historical information can be combined to generate answers which better meet requirements. The method achieves a better effectin the video question-answer problems compared with a traditional method.

Description

technical field [0001] The invention relates to the generation of answers to video questions and answers, in particular to a method for solving video questions and answers by using a multi-granularity convolutional network self-attention context network mechanism. Background technique [0002] Video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. [0003] Existing technologies are mainly aimed at answering questions related to static image generation, and have achieved good results, but there are still great challenges in video question answering, such as contextual correlation between visible information and text information in video, And static images don't reflect this, ignoring a lot of contextual information. The present invention uses a self-attention mechanism to capture context information. Compared with the current R...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/75G06F16/738G06F16/783G06K9/62G06N3/04G06N3/08

CPCG06F16/75G06F16/783G06F16/738G06N3/08G06N3/047G06N3/045G06F18/241G06F18/2415

Inventor 赵洲李国昌金韦克

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for solving video question-answer problems by using multi-granularity convolutional network self-attention context network mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology