Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for solving video question and answer by using hierarchical codec network mechanism

A layered coding and decoder technology, applied in the field of video question answering and answer generation, can solve the problem of lack of video semantic feature modeling

Active Publication Date: 2018-11-06
HANGZHOU YIWISE INTELLIGENT TECH CO LTD
View PDF5 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the lack of modeling of video semantic features in long video question and answer in the prior art, it is aimed at content with different semantics between multiple frames in long video, and these contents In the case of different segments scattered in the video, the present invention provides a method for solving open long video question answering using the adaptive layered reinforcement learning codec network mechanism
[0007] The layered codec network mechanism is used to solve the open long video question answering problem, which includes the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for solving video question and answer by using hierarchical codec network mechanism
  • Method for solving video question and answer by using hierarchical codec network mechanism
  • Method for solving video question and answer by using hierarchical codec network mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0097] The present invention conducts experimental verification on the data set constructed by itself, including 50,000 video clips and 200,000 text descriptions. We use 70% of the data as the training set, 10% of the data as the validation set, and 20% of the data as the test set:

[0098] 1) For each video in the dataset, express all frames as the frame-level representation of the corresponding video in the dataset. And reset each frame to a size of 224×224, and then use the pre-trained VGGNet to obtain the 40-96-dimensional feature expression of each frame.

[0099] 2) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expressions of questions and answers. In particular, the word set contains 5000 words, and the dimension of the word vector is 256 dimensions.

[0100] 3) For the vocabulary size, we set it to 8500 and add " "and" "Encode sentence endings and words that are not in the vocabulary, respect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a method for solving open-type long video question and answer by using a hierarchical codec network mechanism, which mainly comprises the following steps: 1) training an adaptive hierarchical coding neural network for a group of video, question and answer training set, segmenting the long video based on a question and video learning adaptive segmentation mechanism,thereby obtaining a joint expression of the video segment and the question; 2) for the output of the coded neural network obtaining the joint expression of the video and the question, combining the idea of reinforcement learning together with a related answer to train a decoding neural network, directed at a natural language answer corresponding to the output of the joint expression of the video and the question. Compared with the general video question and answer solution, the method for solving open-type long video question and answer by using a hierarchical codec network mechanism in the present invention utilizes question-based adaptive layering, which can better lock the segments favorable for answering questions in long videos, can better reflect the characteristics of the video, andutilizes the reinforcement learning mechanism to train the decoder, so that a more powerful decoder can be obtained, and an answer better satisfying the requirement is produced. The effect achieved by the method in the present invention in the long video question and answer questions is better than the traditional method.

Description

technical field [0001] The present invention relates to video question and answer answer generation, and in particular to a method for generating answers to video-related questions using a layered codec network mechanism. Background technique [0002] Open-ended video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. Open-ended video question answering is the fundamental problem of visual question answering, which automatically generates natural language answers from quoted video content according to a given question. [0003] Most of the current video question answering methods mainly focus on short video question answering questions, and most of their methods learn the semantic representation of the video from the LSTM network layer, and then generate the answer. Although the current technology has achieved good results for s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 俞新荣
Owner HANGZHOU YIWISE INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products