Unlock instant, AI-driven research and patent intelligence for your innovation.

An image content understanding and visual question answering vqa method, storage medium and terminal

A technology of image content and vision, applied in the computer field, can solve problems such as low accuracy, inference of the relationship between images and question keywords, ignoring the intensive interaction between images and texts, etc., to achieve the effect of improving the accuracy.

Active Publication Date: 2022-05-17
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since the current network structure to solve the VQA problem is to learn its attention distribution in each modality separately, and then fuse it, there are several defects: (1) the network can only learn the rough interaction between multiple modalities, While ignoring the intensive interaction between images and text, the current collaborative attention is not enough to infer the relationship between images and question keywords; (2) the accuracy of the task of image question answering (VQA) is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An image content understanding and visual question answering vqa method, storage medium and terminal
  • An image content understanding and visual question answering vqa method, storage medium and terminal
  • An image content understanding and visual question answering vqa method, storage medium and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0066] The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and / or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed item...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an image content understanding and visual question answering VQA method, a storage medium and a terminal. The method includes the following steps: inputting images and questions to be answered into a trained prediction module for answering; the prediction module includes sequentially connected An attention module, a bilinear model and a classifier are fused which output the answer. The present invention follows the principles of "representing images and questions, performing feature representation on images and declarative sentences, fusing feature matrices, learning image features according to question features, learning image features according to correct declarative sentences, using correct declarative sentences to correctly guide the model , get the result" idea, complete the problem solving and visual question answering (VQA) tasks of image content; therefore, a fusion attention method of intensive interaction between image and question keywords is provided, which can learn the intensive interaction between image and text , so as to infer the relationship between the image and the question keyword.

Description

technical field [0001] The invention relates to the field of computer technology, and relates to an image content understanding and visual question answering VQA method, a storage medium and a terminal. Background technique [0002] Image content understanding and visual question answering (VQA) have attracted increasing interest in recent years. Multimodal fusion of global features is the most straightforward VQA solution. The general processing idea is to first represent the image and the question as a global feature, and then use the multimodal fusion model to predict the probability of the answer. [0003] In addition to understanding the visual content of images, VQA requires a full understanding of the semantics of natural language questions. Therefore, it is necessary to learn both textual attention to questions and visual attention to images. At present, the representation of the problem mainly uses LSTM, and the multi-modal fusion mainly uses the residual network...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06N3/047G06N3/048G06F18/253
Inventor 匡平张婷
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA