Visual question and answer enhancement method based on graph convolution

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A vision and convolution technology, applied in the fields of computer vision and natural language, can solve problems such as not being able to explore high-level semantics well, and achieve the effect of improving accuracy

Active Publication Date: 2019-11-01

HANGZHOU DIANZI UNIV

View PDF2 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to use the relationship between the GCN network and the objects in the picture to solve the problem that the visual question answering cannot explore the high-level semantics well. In the task of VQA, the relationship between the objects in the image can help us understand reasoning A procedure that allows us to select the most relevant node for each vertex according to the question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0014] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0015] The visual question answering enhancement method based on graph convolution proposed by the present invention, such as figure 1 As shown, the first step of our model is to first extract features, use GRU to obtain the feature representation of the problem, and use the output of the bottom-up attention model extracted from Faster R-CNN as the feature representation of the image;

[0016] Then the graph learner learns the adjacency matrix of the image objects based on the question, and adds the relations between the objects detected by the relational feature detector. Finally we process graph features and combine them with questions to form a multi-class classification to predict the correct answer.

[0017] The specific imp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual question and answer enhancement method based on graph convolution. The method comprises the following steps: the step 1, respectively extracting feature representations of a picture and a problem; the step 2, extracting a relationship between targets in a picture generated based on the problem; and the step 3, generating the picture with the problem information into graph, selecting a most relevant target for each vertex, generating a new feature representation for each vertex, and performing maximum pooling and classification on the graph. According to the method, high-level semantics are explored by utilizing the relationship between the GCN network and the object in the picture, and the method has great significance for a visual question-answering technology.

Description

technical field [0001] The invention belongs to the technical fields of computer vision and natural language. In particular, the invention relates to a method for enhancing visual question answering based on graph convolution. [0002] technical background [0003] Visual Question Answering (VQA) is an emerging topic that has attracted much attention in recent years. It combines the fields of computer vision and natural language processing (NLP) and requires us to have a good understanding of both. VQA systems take images and free-form natural language questions as input and generate natural language answers as output. Most VQA methods treat the task as a classification task and extract image and question features separately. Afterwards, they explored the problem of multimodal feature fusion of image representations learned from deep convolutional neural networks (CNNs) and from time-series models such as long short-term memory (LSTM) and gated recurrent neural networks (GR...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583G06F16/9032G06K9/46G06N3/04G06N3/08

CPCG06F16/5854G06F16/90332G06N3/08G06V10/422G06N3/048G06N3/045

Inventor 颜成钢俞灵慧孙垚棋张继勇张勇东

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual question and answer enhancement method based on graph convolution

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology