Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-target and image technology, applied in the field of deep neural network structure, can solve problems such as high degree of freedom and complex image content

Active Publication Date: 2019-09-20

HANGZHOU DIANZI UNIV

View PDF11 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] Due to the complexity of image content and diverse subjects in natural scenes, the high degree of freedom of description based on natural language makes the description of image content a huge challenge.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0064] The detailed parameters of the present invention will be further specifically described below.

[0065] Such as figure 1 As shown, the present invention provides a deep neural network framework for Visual Question Answering.

[0066] The data preprocessing described in step (1) and feature extraction are carried out to image and text, specifically as follows:

[0067] 1-1. For the feature extraction of image data, we use the MS-COCO dataset as training and testing data, and use the existing Faster-RCNN model to extract its visual features. Specifically, we input the image data into the Faster-RCNN network, use the Faster-RCNN model to detect and frame 10 to 100 targets in the image, extract 2048-dimensional visual features V from each target image, and record each icon The coordinates and size {x, y, w, h} of the box are used as the geometric features G of the target, where V={v 1 , v 2 ,...,v k}, G={g 1 , g 2 ,..., g k}, k ∈ [10,100].

[0068] 1-2. For the qu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image question and answer method based on multi-objective association deep reasoning. The method comprises the following steps of 1, carrying out data preprocessing on an image and a text described by a natural language of the image; 2, carrying out attention mechanism reordering on each target based on an adaptive attention module model enhanced by geometric features of a candidate box; 3, constructing a neural network structure based on an AAM model; and 4, model training: training neural network parameters by using a back propagation algorithm. The invention provides a deep neural network for image question answering, in particular to a method for performing unified modeling on image-question text data, performing reasoning on each target feature in an image, and reordering attention mechanisms of the targets so as to answer questions more accurately, and a better effect is obtained in the field of image question answeringThe invention discloses an image question and answer method based on multi-target association deep reasoning. The method comprises the following steps: 1, carrying out data preprocessing on an image and a text described in a natural language of the image, and 2, carrying out attention mechanism reordering on each target based on an adaptive attention module model with enhanced geometrical characteristics of a candidate box. And 3, a neural network structure based on an AAM model. And 4, model training: training neural network parameters by using a back propagation algorithm. The invention provides a deep neural network for image question answering, and particularly provides an image-image question answering method. According to the method, the data of question texts are subjected to unified modeling, reasoning is carried out on the characteristics of all the targets in the image, attention mechanisms of all the targets are reordered, so that questions are answered more accurately, and a good effect is obtained in the field of image questions and answers.

Description

technical field [0001] The present invention relates to a deep neural network structure for visual question answering (Visual Question Answering) tasks, in particular to a unified modeling of image-question answering data to find the relationship between each entity feature in the image and the geometric feature of the corresponding spatial position. The interaction relationship between them, by modeling the positional relationship between them, the method of adaptively adjusting the attention weight is achieved. Background technique [0002] Image question answering is an emerging task at the intersection of computer vision and natural language processing. This task aims to allow the machine to automatically answer the corresponding answer by giving a question related to the image. Compared with image description, another cross task of computer vision and natural language processing, it is necessary for the machine to understand images and questions and get the correct res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/08G06N5/04

CPCG06N3/049G06N3/08G06N5/04G06N3/045

Inventor 余宙俞俊汪亮

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology