Method for solving video question and answer task based on multi-mode progressive attention model

Active Publication Date: 2021-11-23

HARBIN UNIV OF SCI & TECH

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In this context, the embodiments of the present invention expect to provide a method for solving video question answering tasks based on a multimodal progressive attention model, so as to overcome the problem that the prior art cannot provide more accurate answers for video question answering tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0054] The principle and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are given only to enable those skilled in the art to better understand and implement the present invention, but not to limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0055] Those skilled in the art know that the embodiments of the present invention can be realized as a system, device, device, method or computer program product. Therefore, the present disclosure may be specifically implemented in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

[0056] According to an embodiment of the present invention, a method for solving ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a method for solving a video question and answer task based on a multi-mode progressive attention model. The method comprises the following steps of: 1, respectively extracting various modal features for various modal information in a video question and answer task; 2, performing preliminary attention on the extracted modal features by using the problem, calculating corresponding weight scores, and performing iterative attention on the important modal features by using the problem so as to position the modal features most relevant to the problem; 3, realizing cross-modal fusion of the features by using a multi-modal fusion algorithm, paying attention to multi-modal fusion representation of the video by using the problem, and finding out important video features related to the problem; and 4, fusing part of effective output results of the model for generating answers. Compared with an existing video question-answering solution, the video frame or the video picture area related to the question can be positioned more accurately. Compared with a traditional method, the effect obtained in the video question and answer task is better.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of video question answering, and more specifically, embodiments of the present invention relate to a method for solving video question answering tasks based on a multimodal progressive attention model. Background technique [0002] In recent years, video question answering is a challenging nascent field that has attracted much attention from researchers. This task requires the model to understand the semantic information between the video and the question, and generate an answer based on this semantic information. Since open-ended questions require the model to automatically generate natural language answers, open-ended questions are more difficult types of questions in video question answering tasks at this stage. [0003] In question answering tasks, video information is more complex than image information. Video is an image sequence with strong temporal dynamics, and there are a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/9032G06F16/783G06F40/284G06N3/04G06N3/08G10L25/03G10L25/27

CPCG06F16/90332G06F16/7834G06F16/7847G10L25/03G10L25/27G06N3/049G06N3/08G06F40/284

Inventor孙广路刘昕雨梁丽丽李天麟

OwnerHARBIN UNIV OF SCI & TECH

Method for solving video question and answer task based on multi-mode progressive attention model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology