Video question answering method based on self-driven twinborn sampling and reasoning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-driven, twinning technology, applied in inference methods, video data retrieval, video data indexing, etc., can solve problems such as failure to make good use of context information, and achieve the effect of enhanced learning.

Pending Publication Date: 2022-03-22

SUN YAT SEN UNIV

View PDF1 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Aiming at the deficiencies of the prior art, the present invention aims to provide a video question answering method based on self-driven twin sampling and reasoning, which solves the problem that the context information in the same video cannot be well utilized in the existing methods. The present invention considers that the same video Different video segments in should have relatively similar video features, so this SiaSamRea (Self-Driven Siamese Sampling and Inference) video question answering method is proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0062] figure 1 The overall framework of the invention is shown. For a video F, the present invention constructs a reference video segment c by sampling anchor and the corresponding twin video segments where N represents the number of all video segments. The length of each video segment c is B frames, and B feature maps are obtained through the video encoder. The encoded features of the reference segment and the twin segment are respectively denoted as v anchor as well as

[0063] The text representation encoded by the text encoder and the video segment representation of the video encoder are stitched together as input, and the multimodal Transformer is used to generate video segment-text features. Benchmark video segment-text features denoted as f anchor , and the corresponding twin video segment-text features are denoted as

[0064] Similarly, the soft label predictions inferred using these features are denoted as p {anchor} with where p is a vector of dimensi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video question and answer method based on self-driven twin sampling and reasoning, the method comprises video segment sampling, feature extraction and reasoning strategies, the video segment sampling obtains a reference video segment through sparse sampling and obtains a twin video segment through twin sampling; the feature extraction encodes a plurality of video segment-text pairs into corresponding semantic feature representations through a video encoder, a text encoder and a multi-mode; according to the inference strategy, a twinborn knowledge generation module is used for generating refined knowledge labels for video segments, and a twinborn knowledge inference module is used for spreading the labels to all twinborn samples and fusing the labels. The method has the beneficial effects that the framework based on self-driven twin sampling and reasoning is provided and is used for extracting context semantic information in different video segments of the same video so as to enhance the learning effect of the network.

Description

technical field [0001] The invention relates to the technical fields of computer vision and pattern recognition in computers, in particular to a video question answering method based on self-driven twin sampling and reasoning. Background technique [0002] Video question answering is a visual-language reasoning task with great application prospects, thus attracting more and more researchers' attention. The video question answering task needs to acquire and use the temporal and spatial features of the visual signal in the video according to the combined semantics of linguistic cues to generate answers. [0003] Some existing works extract general visual information as well as motion features from videos to represent video content, and design different attention mechanisms to integrate these features. These methods focus on how to better understand the overall content of the video, but it is easy to ignore the details in the video segment. Some researchers have also explored...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/783G06F16/78G06F16/75G06F16/71G06F16/332G06V20/40G06V10/764G06K9/62G06N5/04

CPCG06F16/783G06F16/7867G06F16/75G06F16/71G06F16/3329G06N5/04G06F18/2415

Inventor 余伟江卢宇彤李孟非陈志广

Owner SUN YAT SEN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video question answering method based on self-driven twinborn sampling and reasoning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology