Unlock instant, AI-driven research and patent intelligence for your innovation.

Visual question and answer method based on GAT relation reasoning

A relational and visual technology, applied in the field of image processing, can solve the problem of ignoring spatial reasoning, semantic relations and scene understanding, and achieve the effect of improving accuracy

Pending Publication Date: 2022-03-11
XIAN UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a visual question answering method based on GAT relational reasoning, which overcomes the problems that existing visual question answering methods ignore spatial reasoning, semantic relations and scene understanding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question and answer method based on GAT relation reasoning
  • Visual question and answer method based on GAT relation reasoning
  • Visual question and answer method based on GAT relation reasoning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0050] The present invention is based on the visual question answering method of GAT relational reasoning, specifically implements according to the following steps:

[0051] Step 1, question embedding, divide the question into independent words according to punctuation marks and spaces; use the Glove word vector model to vectorize the words; use the bidirectional gated recurrent unit to extract the question vector representation. At the same time, in order to reduce the impact of question noise on the answer prediction results; specifically:

[0052] Step 1.1: First divide the input question into individual words according to punctuation marks and spaces; the input question is converted into an array of words, expressed as the following formula:

[0053] q=[q 1 ,q 2 ,...,q N ]

[0054] Among them, N is the number of words ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a visual question and answer method based on GAT relation reasoning, and the method specifically comprises the steps: firstly, dividing a question into words for vectorization representation, and carrying out the sentence feature extraction, and obtaining a question feature vector; then, the Faster R-CNN is used in combination with the ResNet-101 network model to obtain object space coordinates and object categories, and the BUTD model obtains lt by using the object space coordinates and the object categories; an attribute class and an object class gt; the method comprises the following steps: obtaining a two-tuple, obtaining an edge label between objects by using a relation decoder, dynamically updating graph node information by using a question-guided graph attention convolutional network, and finally performing multi-modal fusion on graph representation and question features and inputting the fused graph representation and question features into a multi-layer perceptron to obtain answers. Ablation experiment verification is carried out on the GAT2R model on a data set, and compared with a reference model BUTD, the accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of image processing, and in particular relates to a visual question answering method based on GAT relational reasoning. Background technique [0002] The goal of Visual Question Answering (VQA) system is to answer questions based on the information provided by pictures. It has important research significance because of its rich application fields. Since the existing visual question answering methods focus on building new attention mechanisms and This makes the model more and more complex, ignoring research on problems that require spatial reasoning, semantic relations, and even scene understanding. Most VQA system frameworks mainly include image encoder, question encoder, multimodal fusion and answer prediction modules. Image representations are learned using convolutional neural networks and text representations are learned using recurrent neural networks, and then the two representations are fused into th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/532G06F16/583G06V10/764G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06F16/532G06F16/5846G06N3/08G06N3/047G06N3/048G06N3/045G06F18/2415G06F18/253Y02D10/00
Inventor 缪亚林李臻童萌白宛婷李国栋
Owner XIAN UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More