Visual question answering method and device based on question semantic mapping

A technology of semantic mapping and questions, applied in the field of machine learning, can solve problems such as ignoring the mapping relationship between questions and answer ranges, ignoring derivation, etc., to achieve the effect of improving stability and accuracy and reducing the probability of wrong results

Pending Publication Date: 2021-09-21
NANJING UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Most of the existing methods extract the relevant features in the image through the attention mechanism, etc., and fuse the information of two different modalities. However, the current methods still have some shortcomings.
Most methods ignore the fact that the question itself plays an important role in the derivation of the answer. The type and method of the question determine the range of the answer. The current methods often ignore the mapping relationship between the question and the answer range.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual question answering method and device based on question semantic mapping
  • Visual question answering method and device based on question semantic mapping
  • Visual question answering method and device based on question semantic mapping

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The technical solution of the present disclosure will be described in detail below in conjunction with the accompanying drawings. In the description of the present invention, it should be understood that the terms "first" and "second" are only used for descriptive purposes, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features, Used only to distinguish different components.

[0021] figure 1 It is a flow chart of the visual question answering method based on question semantic mapping in the present invention, and the method of the present invention specifically includes:

[0022] S1. Perform feature extraction on the images in the training set. The image region features are extracted through the pre-trained ResNet network based on Faster R-CNN, and multiple different region feature vectors V are obtained for each image in the training set.

[0023] Specifically, ResNet uses the pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a visual question answering method and device based on question semantic mapping. The method comprises the steps of extracting visual features of images in a training set and question features in questions; carrying out feature fusion on the extracted visual features and problem features; classifying questions in the training set, and counting answer ranges of different question types in the training set; extracting answer features of answers in the answer range; establishing a mapping relation between the question features and the answer features, and obtaining answer range probability distribution; carrying out reasoning according to fusion features after feature fusion and answer range probability distribution, and obtaining a final answer. The method comprises the following steps: extracting question features through a gating loop unit, extracting consistent semantics of a question and an answer by utilizing embedded learning, mapping the semantics to the answer, and extracting high-level semantic information of an image through a question-oriented regional attention mechanism and a relationship; and fusing the image high-level semantic information with the answer range mapping result to generate a final answer.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a visual question answering method and device based on question semantic mapping. Background technique [0002] Visual question answering is a popular research direction in the field of computer vision. Visual question answering is to input an image and a question related to the image. The model needs to understand the image and question, and then output the answer to the question. Therefore, the same question has different answers in different images, and the same image has different understanding emphases in different question contexts. This means that visual question answering can understand the content of the image in different ways according to different question contexts, and make necessary inferences about the information that cannot be directly obtained in the image, such as the position and size relationship between the subjects involved, and get the final answe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N20/00
CPCG06N20/00G06F18/213G06F18/253G06F18/214
Inventor 路通马云涛
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products