Method for solving video question and answer by using hierarchical codec network mechanism
A layered coding and decoder technology, applied in the field of video question answering and answer generation, can solve the problem of lack of video semantic feature modeling
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0097] The present invention conducts experimental verification on the data set constructed by itself, including 50,000 video clips and 200,000 text descriptions. We use 70% of the data as the training set, 10% of the data as the validation set, and 20% of the data as the test set:
[0098] 1) For each video in the dataset, express all frames as the frame-level representation of the corresponding video in the dataset. And reset each frame to a size of 224×224, and then use the pre-trained VGGNet to obtain the 40-96-dimensional feature expression of each frame.
[0099] 2) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expressions of questions and answers. In particular, the word set contains 5000 words, and the dimension of the word vector is 256 dimensions.
[0100] 3) For the vocabulary size, we set it to 8500 and add " "and" "Encode sentence endings and words that are not in the vocabulary, respect...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com