Method for solving video question and answer by using hierarchical codec network mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A layered coding and decoder technology, applied in the field of video question answering and answer generation, can solve the problem of lack of video semantic feature modeling

Active Publication Date: 2018-11-06

HANGZHOU YIWISE INTELLIGENT TECH CO LTD

View PDF5 Cites 26 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve the problems in the prior art. In order to overcome the lack of modeling of video semantic features in long video question and answer in the prior art, it is aimed at content with different semantics between multiple frames in long video, and these contents In the case of different segments scattered in the video, the present invention provides a method for solving open long video question answering using the adaptive layered reinforcement learning codec network mechanism

[0007] The layered codec network mechanism is used to solve the open long video question answering problem, which includes the following steps:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0097] The present invention conducts experimental verification on the data set constructed by itself, including 50,000 video clips and 200,000 text descriptions. We use 70% of the data as the training set, 10% of the data as the validation set, and 20% of the data as the test set:

[0098] 1) For each video in the dataset, express all frames as the frame-level representation of the corresponding video in the dataset. And reset each frame to a size of 224×224, and then use the pre-trained VGGNet to obtain the 40-96-dimensional feature expression of each frame.

[0099] 2) For questions and answers, the present invention utilizes the word2vec model trained in advance to extract the semantic expressions of questions and answers. In particular, the word set contains 5000 words, and the dimension of the word vector is 256 dimensions.

[0100] 3) For the vocabulary size, we set it to 8500 and add " "and" "Encode sentence endings and words that are not in the vocabulary, respect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses a method for solving open-type long video question and answer by using a hierarchical codec network mechanism, which mainly comprises the following steps: 1) training an adaptive hierarchical coding neural network for a group of video, question and answer training set, segmenting the long video based on a question and video learning adaptive segmentation mechanism,thereby obtaining a joint expression of the video segment and the question; 2) for the output of the coded neural network obtaining the joint expression of the video and the question, combining the idea of reinforcement learning together with a related answer to train a decoding neural network, directed at a natural language answer corresponding to the output of the joint expression of the video and the question. Compared with the general video question and answer solution, the method for solving open-type long video question and answer by using a hierarchical codec network mechanism in the present invention utilizes question-based adaptive layering, which can better lock the segments favorable for answering questions in long videos, can better reflect the characteristics of the video, andutilizes the reinforcement learning mechanism to train the decoder, so that a more powerful decoder can be obtained, and an answer better satisfying the requirement is produced. The effect achieved by the method in the present invention in the long video question and answer questions is better than the traditional method.

Description

technical field [0001] The present invention relates to video question and answer answer generation, and in particular to a method for generating answers to video-related questions using a layered codec network mechanism. Background technique [0002] Open-ended video question answering is an important problem in the field of video information retrieval. The goal of this question is to automatically generate answers for related videos and corresponding questions. Open-ended video question answering is the fundamental problem of visual question answering, which automatically generates natural language answers from quoted video content according to a given question. [0003] Most of the current video question answering methods mainly focus on short video question answering questions, and most of their methods learn the semantic representation of the video from the LSTM network layer, and then generate the answer. Although the current technology has achieved good results for s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06N3/04G06N3/08

CPCG06N3/08G06N3/045

Inventor 俞新荣

Owner HANGZHOU YIWISE INTELLIGENT TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for solving video question and answer by using hierarchical codec network mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology