Image content question and answer method based on multi-modality low-rank dual-linear pooling

An image content, bilinear technology, applied in the field of deep neural network, can solve the problem of high computational complexity

Active Publication Date: 2017-12-15
HANGZHOU DIANZI UNIV
View PDF5 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the feature fusion model based on the bilinear model has played a very good role in many fields, such as fine-grained image classification, natural language processing, and recommendation systems. here comes a big challenge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image content question and answer method based on multi-modality low-rank dual-linear pooling
  • Image content question and answer method based on multi-modality low-rank dual-linear pooling
  • Image content question and answer method based on multi-modality low-rank dual-linear pooling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0102] The detailed parameters of the present invention will be further specifically described below.

[0103] Such as figure 1 As shown, the present invention provides a deep neural network structure for image content question answering (Image Question Answer, IQA), and the specific steps are as follows:

[0104] The data preprocessing described in step (1) and image and text are carried out feature extraction, specifically as follows:

[0105] The COCO-VQA dataset is used here as training and testing data.

[0106] 1-1. For image data, the existing 152-layer deep residual network (Resnet-152) model is used to extract image features. Specifically, we uniformly scale the image data to 448×448 and input it into the deep residual network, and extract the output of its res5c layer as the image feature

[0107] 1-2. For question text data, we first segment the question and build a word dictionary for the question. And each question only takes the first 15 words, and if the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an image content question and answer method based on multi-modality low-rank dual-linear pooling. The image content question and answer method comprises the following steps of 1, conducting data preprocessing on an answer text of a question described with an image and a natural language; 2, conducting characteristic fusing on a multi-modality low-rank dual-linear pooling model; 3, establishing a neural network structure based on an MFB pooling model and a collaborative concern model; 4, training the models and utilizing a backward propagation algorithm to train neural network parameters. The neural network model aiming at an image question & answer is provided, and especially a method for conducting unified modeling on cross-media data in a question-answer in the field of the image question and answer and a network structure for studying the 'collaborative concern' to conduct modeling description on an image and question fine-grained characteristics are provided, and the best effect currently in the field of the image question and answer is obtained.

Description

technical field [0001] The present invention relates to a deep neural network for Image Question Answer (IQA), in particular to a method for uniformly modeling image-question cross-media data and learning " Collaborative concern" for modeling expression. Background technique [0002] "Cross-media" unified expression is a cross direction between the research fields of computer vision and natural language processing, which aims to bridge the "semantic gap" between different media (such as images and texts) and establish a unified semantic expression. Based on the theory and method of cross-media unified expression, some current hot research directions have been derived, such as natural description generation (ImageCaptioning), image-text cross-media retrieval (Image-Text Cross-media Retrieval) and automatic question answering of image content (Image Questioning) Answering, IQA) and so on. The goal of image natural description generation is to give an image a summary of its c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06N3/08
CPCG06F16/583G06F40/289G06N3/084
Inventor 俞俊余宙项晨钞
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products