Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image question-answering method based on multi-scale deep learning

A deep learning, multi-scale technology, applied in instruments, biological neural network models, character and pattern recognition, etc., can solve problems such as model attempts to explore the validity of multi-scale pictures, and achieve the effect of improving prediction accuracy

Inactive Publication Date: 2018-06-01
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the current image question answering system, there is no model that attempts to explore the effectiveness of multi-scale image features in image question answering and explain how the deep neural network model for image question answering predicts the answer.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image question-answering method based on multi-scale deep learning
  • Image question-answering method based on multi-scale deep learning
  • Image question-answering method based on multi-scale deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0045] as attached figure 1 as shown, figure 1 It is a schematic flow diagram of the image question answering method based on multi-scale deep learning disclosed in the present invention, and the specific technical solution includes the following steps:

[0046]Step S1. For the images in the training dataset, set images of three scales: large, medium, and small, to obtain images of different scales. Images of different scales are obtained by scaling the original image to a certain set scale as a large-scale image, and then cutting out a medium-scale image and a small-scale image from the center of the large-scale image. In this embodiment, the unified size of pictures with different original sizes is 448*448, as a large-scale picture, the size of a medium-scale picture is 352*352, and the size of a small-scale picture is 224*224.

[0047] Step S2. For images of different scales, use the pre-trained convolutional neural network to extract the image features of each scale, and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an image question-answering method based on multi-scale deep learning. The method is enlightened by the cognitive behaviors of people during the image question-answering process. The method copmrises the following steps of 1) setting the same picture as three pictures different in scale size according to requirements, adopting a pre-trained convolution neural network for extracting picture features and obtaining multi-scale picture features; 2) obtaining the feature expression of an interrogative sentence by adopting a recursion neural network, and acquiring n-element interrogative sentence feature expression by connecting a convolution layer with different convolution kernel sizes; 3) exploring the built-in incidence relation between picture features of different scales and n-element interrogative sentence feature expression, namely the similarity measurement; 4) finally, fusing picture features of different scales and n-element interrogative sentence features,and deducing the answer of a predicted problem from the large scale to the middle scale and the small scale by adopting a hierarchical network structure. According to the invention, the cognitive behaviors of people during the image question-answering process are simulated. Meanwhile, the high precision is obtained on the basis of a reference data set.

Description

technical field [0001] The invention relates to the multi-modal technical field of image processing and natural language processing, in particular to an image question answering method based on multi-scale deep learning. Background technique [0002] Image question answering is a challenging task that has emerged in recent years. Given a picture and a question related to the picture, the image question answering task can automatically generate an answer. Image question answering has many potential applications, such as helping blind or visually impaired people access information from websites or the real world. In addition, it can be used for human-computer interaction to interrogate visual content, etc. [0003] At present, most of the algorithms that have been proposed to solve image question answering tasks use convolutional neural networks to extract the features of the entire image and recurrent neural networks to extract the features of question sentences, and then le...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06K9/46G06N3/04
CPCG06V10/462G06N3/045G06F18/214
Inventor 马千里余柳红
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products