Visual question-answer method based on equal attention graph network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention and network technology, applied in the field of image visual question answering, can solve the problems of ignoring the image structure and being unable to effectively lock the scene target, etc., and achieve the effect of sufficient evidence and improved performance

Pending Publication Date: 2021-06-04

NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although these methods have proved their value, they largely ignore the structure of the given image and cannot effectively locate objects in the scene, making them face problems in relational reasoning for large-scale interactions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0022] A method for visual question answering based on an equal attention graph network, comprising the following steps:

[0023] Step 1, preprocess the input image I, send the image I to the feature extraction network, and obtain the regional target features composed of K regional features with the highest confidence;

[0024] As a preferred solution, the feature extraction network used in step 1 is a Faster R-CNN network, the value of K is 36, and each regional target feature is represented by a 2048-dimensional vector.

[0025] Step 2. In order to obtain the input feature representation, the image I is converted into a graph representation G by using the regional target features obtained in step 1. G includes the nodes represented by the target object and the relationship edges corresponding to the relationship between objects, and the input question text Q Perform word embedding processing and encoding to obtain the question feature q;

[0026] As a preferred solution, th...

Embodiment 2

[0034] A method for visual question answering based on an equal attention graph network, comprising the following steps:

[0035] Step 1. Preprocess the input image I, send the image I to the feature extraction network, and obtain the regional target features composed of the features of K regions with the highest confidence. The feature extraction network used here is the Faster R-CNN network, the value of K is 36, and each regional target feature is represented by a 2048-dimensional vector.

[0036] Specifically, the training process of the Faster R-CNN network here is to first use the ResNet-101 network pre-trained on the ImageNet dataset to initialize the Faster R-CNN model, and then use the labeling information of the Visual Genome dataset to perform model training. train.

[0037] Step 2. In order to obtain the input feature representation, the image I is converted into a graph representation G by using the regional target features obtained in step 1. G is composed of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual question-answering method based on an equal attention graph network, which comprises the following steps of: firstly, extracting regional target characteristics of an input image, converting the image into a graph for representation, and coding an input question; then, establishing a visual question and answer model based on a graph network, and dividing a processing process into two stages: in the first stage, applying an equal attention mechanism to graph representation to obtain new node features and relation edge features, and in the second stage, fusing the node features and the relation edge features obtained in the first stage into graph features; and interacting with the questions to obtain new graph features, and finally deducing answers from the obtained graph features and the questions together. The invention is applied to image visual question and answer, and compared with a traditional method utilizing overall image features or other graph network visual questioning and answering methods neglecting relation importance, the performance of a visual question-answer model is effectively improved by adopting the technical scheme of the invention.

Description

technical field [0001] The invention belongs to the technical field of image visual question answering, and in particular relates to a visual question answering method based on an equal attention map network. Background technique [0002] Visual question answering is the task of outputting corresponding natural language answers based on a given image and a free and open natural language question. As a research direction of visual understanding, visual question answering is a research topic at the intersection of computer vision and natural language processing, connecting vision and language. Nowadays, with the development of technology in the two major research fields of computer vision and natural language processing, visual question answering has become a very attractive and dynamic research direction. Since visual question answering requires the ability to process multimodal information simultaneously, it is considered a benchmark for general artificial intelligence and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/332G06K9/46G06K9/62G06N3/04G06N3/08

CPCG06F16/332G06N3/08G06V10/44G06N3/045G06N3/044G06F18/241

Inventor 袁家斌王天星刘昕

Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Visual question-answer method based on equal attention graph network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology