Unlock instant, AI-driven research and patent intelligence for your innovation.

A visual question answering method based on the fusion of fine-grained image features and external knowledge

A technology of image features and external knowledge, applied in the field of visual question answering, can solve problems such as application scene limitations, poor applicability of fine-grained visual questions, poor answering effect of fine-grained image visual questions, etc., to achieve improved applicability and high accuracy Effect

Active Publication Date: 2021-07-20
NORTHWESTERN POLYTECHNICAL UNIV
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The visual question answering task requires the computer to be able to deeply understand the content of the image in the visual question and the semantics of the question. The answer to some questions also requires the computer to master relevant common sense or specific knowledge. Therefore, the research on visual question answering involves many artificial intelligence technologies, including detailed Granularity recognition, object recognition, behavior recognition and natural language processing, etc., which make visual question answering have higher requirements and greater challenges in image semantic understanding than traditional computer vision research.
[0003] There have been some studies on visual question answering in the prior art, but they use global image features, and cannot obtain fine-grained visual features that are highly correlated with the question text, and have poor applicability to fine-grained visual questions; most methods only focus on The content of the visual problem itself is greatly limited in its application scenarios; at the same time, the answer to the visual problem of fine-grained images is not good, and it is impossible to make certain reasoning on the basis of the visual problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A visual question answering method based on the fusion of fine-grained image features and external knowledge
  • A visual question answering method based on the fusion of fine-grained image features and external knowledge
  • A visual question answering method based on the fusion of fine-grained image features and external knowledge

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0069] 1. Use the NLTK part-of-speech tagging tool to segment the visual question sentences in the sample library, and build a dictionary for the word segmentation results. Each word in the dictionary corresponds to a unique number;

[0070] 2. If figure 2 As shown, the original image is first segmented using an unsupervised image segmentation algorithm. The segmentation result outputs an image that marks each segmented area with different RGB color values, and the pixel coordinate information of each segmented area can be obtained by using different RGB color values. Through these pixel coordinate information, it can be determined that the image feature map corresponds to the original image Parts of each split region. The image size of the segmentation result is unified to 224×224×3 after processing.

[0071] The VGG-16 network whose weights have been pre-trained on ImageNet with the fully connected layer and Softmax layer removed is used as the image feature extractor. T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a visual question answering method based on the fusion of fine-grained image features and external knowledge. The method consists of four steps: fine-grained image feature extraction, text processing and feature extraction, question knowledge retrieval based on external knowledge base and Multimodal feature fusion and answer prediction. Fine-grained image feature extraction is used to extract the sub-regional visual features of the image; text processing and feature extraction process visual question sentences and obtain the overall features of the questions; question knowledge retrieval based on external knowledge bases is introduced as a model of Freebase knowledge graph The external knowledge base for visual question answer prediction supplements necessary common sense or specific knowledge; multimodal feature fusion and answer prediction use similarity-based feature fusion method for multimodal feature fusion, and use the fused visual question features Make predictions about the answer to the question. The proposed method has better performance, with higher prediction accuracy for answers to visual questions.

Description

technical field [0001] The invention belongs to the field of intelligent information processing, and in particular relates to a visual question answering method. Background technique [0002] Visual Question Answering (VQA) is an interdisciplinary subject combining computer vision and natural language processing research. Its research goal is to enable computers to predict the answers to visual questions. The specific process is to input an image and an open question related to the image to the computer. The visual question answering system first needs to understand the semantics of the visual question text, and then combine the visual information of the image related to the question to predict the answer. The visual question answering task requires the computer to be able to deeply understand the content of the image in the visual question and the semantics of the question. The answer to some questions also requires the computer to master relevant common sense or specific k...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332G06F16/36G06K9/32G06K9/34G06K9/62G06N3/04G06N3/08
CPCG06F16/3329G06F16/367G06N3/08G06V10/267G06V10/25G06V10/751G06N3/045G06F18/2411G06F18/253
Inventor 宋凌云李建鳌尚学群俞梦真彭杨柳李伟李战怀
Owner NORTHWESTERN POLYTECHNICAL UNIV