A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A natural language and event relationship technology, applied to computer components, instruments, biological neural network models, etc., can solve problems such as inability to obtain event relationships, unsatisfactory effects, descriptive language accuracy and naturalness, etc., to achieve accurate output , the effect of reducing information loss

Active Publication Date: 2021-07-02

SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in the description task of multi-event video, the existing methods have different shortcomings.

The main problems are: 1) For the description of multi-event videos, these methods cannot obtain the relationship between events; 2) For video clips with large differences in length, the effect of using a unified encoder-decoder architecture is not good. ideal

The shortcomings of these two aspects lead to a decline in the accuracy and naturalness of the description language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0021] The specific embodiment of the present invention proposes a multi-event natural language description algorithm in video oriented to event relation coding, refer to figure 1 , the algorithm includes the following steps S1 to S4:

[0022] S1. A three-dimensional convolutional neural network is used to extract depth features from a given video sequence, and several depth feature vectors are obtained to form a depth feature sequence. For a given video sequence, the operation form obtained from the video sequence and event proposal can be written as: in, is the vocabulary sequence of the sentence, p={p start ,p end} is the start and end interval of a given event, Represents a sequence of deep features for a video sequence.

[0023] In order to obtain the depth feature sequence of the video sequence, first, for the given video sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses a multi-event natural language description algorithm in video oriented to event relationship encoding, which includes the following steps: S1. Using a three-dimensional convolutional neural network to extract depth features from a given video sequence to obtain several depth feature vectors , forming a deep feature sequence; S2, based on the deep feature sequence, using a recurrent neural network as a time series analysis method to calculate the proposed start and end intervals of events in the video sequence; S3, selecting the events to be described in the video sequence Describe the event, and re-encode the corresponding subsequence of the event to be described in the deep feature sequence according to the proposed start and end interval of the event to obtain the descriptor of the event to be described; S4. Using attention-based The LSTM adaptive decoder of the model decodes the descriptor to obtain a natural language used to describe the event to be described.

Description

technical field [0001] The invention relates to the technical field of natural language description, in particular to an algorithm for detecting events from videos and describing the events with natural language. Background technique [0002] Visual natural language description (Captioning) is the task of converting visual information into natural language. Usually this task uses codec architecture as a key technology. As the main steps in this process, the quality of the features output by the encoder and the generative model of the decoder have a significant impact on the final natural language results. Visual natural language description has been explored in both video and image. In general, state-of-the-art prevalents use neural network-based computational models to address the actual modeling of architectures. On the image, the convolutional neural network has good results in many visual understanding tasks, and the work in the image description task often uses this ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06K9/00G06N3/04

CPCG06N3/049G06V20/41G06V20/44G06V20/46

Inventor袁春杨大力

OwnerSHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

A Method for Multi-Event Natural Language Description in Video Based on Event Relation Coding

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology