Method for solving video question and answer task based on multi-mode progressive attention model

Active Publication Date: 2021-11-23
HARBIN UNIV OF SCI & TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In this context, the embodiments of the present invention expect to provide a method for solving video question answering tasks based on a multimodal progressive attention model, so as to overcome the problem that the prior art cannot provide more accurate answers for video question answering tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video question and answer task based on multi-mode progressive attention model
  • Method for solving video question and answer task based on multi-mode progressive attention model
  • Method for solving video question and answer task based on multi-mode progressive attention model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The principle and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are given only to enable those skilled in the art to better understand and implement the present invention, but not to limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0055] Those skilled in the art know that the embodiments of the present invention can be realized as a system, device, device, method or computer program product. Therefore, the present disclosure may be specifically implemented in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

[0056] According to an embodiment of the present invention, a method for solving ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a method for solving a video question and answer task based on a multi-mode progressive attention model. The method comprises the following steps of: 1, respectively extracting various modal features for various modal information in a video question and answer task; 2, performing preliminary attention on the extracted modal features by using the problem, calculating corresponding weight scores, and performing iterative attention on the important modal features by using the problem so as to position the modal features most relevant to the problem; 3, realizing cross-modal fusion of the features by using a multi-modal fusion algorithm, paying attention to multi-modal fusion representation of the video by using the problem, and finding out important video features related to the problem; and 4, fusing part of effective output results of the model for generating answers. Compared with an existing video question-answering solution, the video frame or the video picture area related to the question can be positioned more accurately. Compared with a traditional method, the effect obtained in the video question and answer task is better.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of video question answering, and more specifically, embodiments of the present invention relate to a method for solving video question answering tasks based on a multimodal progressive attention model. Background technique [0002] In recent years, video question answering is a challenging nascent field that has attracted much attention from researchers. This task requires the model to understand the semantic information between the video and the question, and generate an answer based on this semantic information. Since open-ended questions require the model to automatically generate natural language answers, open-ended questions are more difficult types of questions in video question answering tasks at this stage. [0003] In question answering tasks, video information is more complex than image information. Video is an image sequence with strong temporal dynamics, and there are a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/9032G06F16/783G06F40/284G06N3/04G06N3/08G10L25/03G10L25/27
CPCG06F16/90332G06F16/7834G06F16/7847G10L25/03G10L25/27G06N3/049G06N3/08G06F40/284
Inventor 孙广路刘昕雨梁丽丽李天麟
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products