Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning

A multi-target and image technology, applied in the field of deep neural network structure, can solve problems such as high degree of freedom and complex image content

Active Publication Date: 2019-09-20
HANGZHOU DIANZI UNIV
View PDF11 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Due to the complexity of image content and diverse subjects in natural scenes, the high degree of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning
  • Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning
  • Image question and answer method based on multi-objective association deep reasoningmulti-target association deep reasoning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The detailed parameters of the present invention will be further specifically described below.

[0065] Such as figure 1 As shown, the present invention provides a deep neural network framework for Visual Question Answering.

[0066] The data preprocessing described in step (1) and feature extraction are carried out to image and text, specifically as follows:

[0067] 1-1. For the feature extraction of image data, we use the MS-COCO dataset as training and testing data, and use the existing Faster-RCNN model to extract its visual features. Specifically, we input the image data into the Faster-RCNN network, use the Faster-RCNN model to detect and frame 10 to 100 targets in the image, extract 2048-dimensional visual features V from each target image, and record each icon The coordinates and size {x, y, w, h} of the box are used as the geometric features G of the target, where V={v 1 , v 2 ,...,v k}, G={g 1 , g 2 ,..., g k}, k ∈ [10,100].

[0068] 1-2. For the qu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an image question and answer method based on multi-objective association deep reasoning. The method comprises the following steps of 1, carrying out data preprocessing on an image and a text described by a natural language of the image; 2, carrying out attention mechanism reordering on each target based on an adaptive attention module model enhanced by geometric features of a candidate box; 3, constructing a neural network structure based on an AAM model; and 4, model training: training neural network parameters by using a back propagation algorithm. The invention provides a deep neural network for image question answering, in particular to a method for performing unified modeling on image-question text data, performing reasoning on each target feature in an image, and reordering attention mechanisms of the targets so as to answer questions more accurately, and a better effect is obtained in the field of image question answeringThe invention discloses an image question and answer method based on multi-target association deep reasoning. The method comprises the following steps: 1, carrying out data preprocessing on an image and a text described in a natural language of the image, and 2, carrying out attention mechanism reordering on each target based on an adaptive attention module model with enhanced geometrical characteristics of a candidate box. And 3, a neural network structure based on an AAM model. And 4, model training: training neural network parameters by using a back propagation algorithm. The invention provides a deep neural network for image question answering, and particularly provides an image-image question answering method. According to the method, the data of question texts are subjected to unified modeling, reasoning is carried out on the characteristics of all the targets in the image, attention mechanisms of all the targets are reordered, so that questions are answered more accurately, and a good effect is obtained in the field of image questions and answers.

Description

technical field [0001] The present invention relates to a deep neural network structure for visual question answering (Visual Question Answering) tasks, in particular to a unified modeling of image-question answering data to find the relationship between each entity feature in the image and the geometric feature of the corresponding spatial position. The interaction relationship between them, by modeling the positional relationship between them, the method of adaptively adjusting the attention weight is achieved. Background technique [0002] Image question answering is an emerging task at the intersection of computer vision and natural language processing. This task aims to allow the machine to automatically answer the corresponding answer by giving a question related to the image. Compared with image description, another cross task of computer vision and natural language processing, it is necessary for the machine to understand images and questions and get the correct res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/08G06N5/04
CPCG06N3/049G06N3/08G06N5/04G06N3/045
Inventor 余宙俞俊汪亮
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products