Visual question and answer method based on GAT relation reasoning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A relational and visual technology, applied in the field of image processing, can solve the problem of ignoring spatial reasoning, semantic relations and scene understanding, and achieve the effect of improving accuracy

Pending Publication Date: 2022-03-11

XIAN UNIV OF TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to provide a visual question answering method based on GAT relational reasoning, which overcomes the problems that existing visual question answering methods ignore spatial reasoning, semantic relations and scene understanding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0050] The present invention is based on the visual question answering method of GAT relational reasoning, specifically implements according to the following steps:

[0051] Step 1, question embedding, divide the question into independent words according to punctuation marks and spaces; use the Glove word vector model to vectorize the words; use the bidirectional gated recurrent unit to extract the question vector representation. At the same time, in order to reduce the impact of question noise on the answer prediction results; specifically:

[0052] Step 1.1: First divide the input question into individual words according to punctuation marks and spaces; the input question is converted into an array of words, expressed as the following formula:

[0053] q=[q 1 ,q 2 ,...,q N ]

[0054] Among them, N is the number of words ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual question and answer method based on GAT relation reasoning, and the method specifically comprises the steps: firstly, dividing a question into words for vectorization representation, and carrying out the sentence feature extraction, and obtaining a question feature vector; then, the Faster R-CNN is used in combination with the ResNet-101 network model to obtain object space coordinates and object categories, and the BUTD model obtains lt by using the object space coordinates and the object categories; an attribute class and an object class gt; the method comprises the following steps: obtaining a two-tuple, obtaining an edge label between objects by using a relation decoder, dynamically updating graph node information by using a question-guided graph attention convolutional network, and finally performing multi-modal fusion on graph representation and question features and inputting the fused graph representation and question features into a multi-layer perceptron to obtain answers. Ablation experiment verification is carried out on the GAT2R model on a data set, and compared with a reference model BUTD, the accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of image processing, and in particular relates to a visual question answering method based on GAT relational reasoning. Background technique [0002] The goal of Visual Question Answering (VQA) system is to answer questions based on the information provided by pictures. It has important research significance because of its rich application fields. Since the existing visual question answering methods focus on building new attention mechanisms and This makes the model more and more complex, ignoring research on problems that require spatial reasoning, semantic relations, and even scene understanding. Most VQA system frameworks mainly include image encoder, question encoder, multimodal fusion and answer prediction modules. Image representations are learned using convolutional neural networks and text representations are learned using recurrent neural networks, and then the two representations are fused into th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/532G06F16/583G06V10/764G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06F16/532G06F16/5846G06N3/08G06N3/047G06N3/048G06N3/045G06F18/2415G06F18/253Y02D10/00

Inventor 缪亚林李臻童萌白宛婷李国栋

Owner XIAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual question and answer method based on GAT relation reasoning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology