Multi-event natural language description algorithm in video with event relation coding orientation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of natural language and event relationship, applied in computing, computer components, instruments, etc., can solve the problems of inability to obtain event relationship, unsatisfactory effect, and decrease in the accuracy and naturalness of description language, so as to achieve accurate output and less information loss effect

Active Publication Date: 2018-12-07

SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

View PDF3 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in the description task of multi-event video, the existing methods have different shortcomings.

The main problems are: 1) For the description of multi-event videos, these methods cannot obtain the relationship between events; 2) For video clips with large differences in length, the effect of using a unified encoder-decoder architecture is not good. ideal

The shortcomings of these two aspects lead to a decline in the accuracy and naturalness of the description language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0021] The specific embodiment of the present invention proposes a multi-event natural language description algorithm in video oriented to event relation coding, refer to figure 1 , the algorithm includes the following steps S1 to S4:

[0022] S1. A three-dimensional convolutional neural network is used to extract depth features from a given video sequence, and several depth feature vectors are obtained to form a depth feature sequence. For a given video sequence, the operation form obtained from the video sequence and event proposal can be written as: in, is the vocabulary sequence of the sentence, p={p start ,p end} is the start and end interval of a given event, Represents a sequence of deep features for a video sequence.

[0023] In order to obtain the depth feature sequence of the video sequence, first, for the given video sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-event natural language description algorithm in a video with event relation coding orientation. The multi-event natural language description algorithm is implemented asfollows: S1, depth features are extracted from a given video sequence by using a three-dimensional convolutional neural network to obtain a plurality of depth feature vectors and form a depth featuresequence; S2, on the basis of the depth feature sequence, a proposed start-stop interval of an event in the video sequence is calculated by using a recurrent neural network as a time sequence analysis method; S3, a to-be-described event needing description is selected in the video sequence, and a corresponding sub-sequence of the to-be-described event in the depth feature sequence is coded againbased on the start-stop interval of the event to obtain a descriptor of the to-be-described event; and S4, the descriptor is decoded by using an attention-model-based LSTM adaptive decoder to obtain anatural language for describing the to-be-described event.

Description

technical field [0001] The invention relates to the technical field of natural language description, in particular to an algorithm for detecting events from videos and describing the events with natural language. Background technique [0002] Visual natural language description (Captioning) is the task of converting visual information into natural language. Usually this task uses codec architecture as a key technology. As the main steps in this process, the quality of the features output by the encoder and the generative model of the decoder have a significant impact on the final natural language results. Visual natural language description has been explored in both video and image. In general, state-of-the-art prevalents use neural network-based computational models to address the actual modeling of architectures. On the image, the convolutional neural network has good results in many visual understanding tasks, and the work in the image description task often uses this ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06N3/04

CPCG06N3/049G06V20/41G06V20/44G06V20/46

Inventor 袁春杨大力

Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-event natural language description algorithm in video with event relation coding orientation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology