Visual question and answer method optimized by using position information
A technology of location information and vision, applied in the field of visual question answering, can solve problems such as complex real scenes, small scope of application, and poor performance, and achieve the effects of optimizing time efficiency, increasing training time, and improving performance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0120] A visual question answering method optimized with location information, such as figure 1 shown, including the following steps:
[0121] S1. Collect training data, including pictures and questions related to a given picture, and then manually calibrate the answers to the questions; this embodiment directly uses the VQA v2.0 data set.
[0122] S2. Build a problem preprocessing module to preprocess the input problem to obtain the semantic feature vector input and position feature vector of the problem, including the following steps:
[0123] S2.1. Calculate the semantic feature vector of each word in the input question: initialize each word with GLoVe word embedding, and then input it to the long short-term memory network (LSTM) to get the semantic feature vector of a single word; since each word The length of the problem is different, the zero vector is used to fill or reduce, and the dimension is N×d 1 The semantic feature vector representing the question, where N is t...
Embodiment 2
[0228] In this embodiment, the difference from Embodiment 1 is that this embodiment uses the VQA v1.0 data set, b in step S7 is 248349, that is, the size of the VQA v1.0 training set is 248349, and r in step 7 is 2410, that is, the number of candidate answers for VQA v1.0 is 2410.
Embodiment 3
[0230] In this embodiment, the difference from Embodiment 1 is that this embodiment adopts the COCO-QA data set, and b in step S7 is 78736, that is, the size of the COCO-QA training set is 78736, and r in step 7 is 435, That is, the number of candidate answers for COCO-QA is 435.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



