Visual question and answer method based on a combined relation attention network
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 成都澳海川科技有限公司
- Publication Date
- 2019-09-10
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the technical field of visual question answering (VQA for short), and more specifically relates to a visual question answering method based on a combined relational attention network. Background technique
[0002] In the existing technology, visual question answering (VQA) is mainly divided into two steps: 1) understand the content of image and text questions, extract image features and question features; 2) fuse image features and question features to obtain multimodal feature representation , and then predict the answer to the question through a softmax classifier. Among them, the attention mechanism (Attention) achieves the purpose of better understanding the image and the content of the question by focusing on the image area related to the question and the keywords in the question.
[0003] In terms of feature fusion, at present, it is mostly based on bilinear network (Bilinear Network), which can well combine image featur...