Multi-event natural language description algorithm in video with event relation coding orientation

A technology of natural language and event relationship, applied in computing, computer components, instruments, etc., can solve the problems of inability to obtain event relationship, unsatisfactory effect, and decrease in the accuracy and naturalness of description language, so as to achieve accurate output and less information loss effect

Active Publication Date: 2018-12-07
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the description task of multi-event video, the existing methods have different shortcomings.
The main problems are: 1) For the description of multi-event videos, these methods cannot obtain the relationship between events; 2) For video clips with large differences in length, the effect of using a unified encoder-decoder architecture is not good. ideal
The shortcomings of these two aspects lead to a decline in the accuracy and naturalness of the description language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-event natural language description algorithm in video with event relation coding orientation
  • Multi-event natural language description algorithm in video with event relation coding orientation
  • Multi-event natural language description algorithm in video with event relation coding orientation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0021] The specific embodiment of the present invention proposes a multi-event natural language description algorithm in video oriented to event relation coding, refer to figure 1 , the algorithm includes the following steps S1 to S4:

[0022] S1. A three-dimensional convolutional neural network is used to extract depth features from a given video sequence, and several depth feature vectors are obtained to form a depth feature sequence. For a given video sequence, the operation form obtained from the video sequence and event proposal can be written as: in, is the vocabulary sequence of the sentence, p={p start ,p end} is the start and end interval of a given event, Represents a sequence of deep features for a video sequence.

[0023] In order to obtain the depth feature sequence of the video sequence, first, for the given video sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-event natural language description algorithm in a video with event relation coding orientation. The multi-event natural language description algorithm is implemented asfollows: S1, depth features are extracted from a given video sequence by using a three-dimensional convolutional neural network to obtain a plurality of depth feature vectors and form a depth featuresequence; S2, on the basis of the depth feature sequence, a proposed start-stop interval of an event in the video sequence is calculated by using a recurrent neural network as a time sequence analysis method; S3, a to-be-described event needing description is selected in the video sequence, and a corresponding sub-sequence of the to-be-described event in the depth feature sequence is coded againbased on the start-stop interval of the event to obtain a descriptor of the to-be-described event; and S4, the descriptor is decoded by using an attention-model-based LSTM adaptive decoder to obtain anatural language for describing the to-be-described event.

Description

technical field [0001] The invention relates to the technical field of natural language description, in particular to an algorithm for detecting events from videos and describing the events with natural language. Background technique [0002] Visual natural language description (Captioning) is the task of converting visual information into natural language. Usually this task uses codec architecture as a key technology. As the main steps in this process, the quality of the features output by the encoder and the generative model of the decoder have a significant impact on the final natural language results. Visual natural language description has been explored in both video and image. In general, state-of-the-art prevalents use neural network-based computational models to address the actual modeling of architectures. On the image, the convolutional neural network has good results in many visual understanding tasks, and the work in the image description task often uses this ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06N3/04
CPCG06N3/049G06V20/41G06V20/44G06V20/46
Inventor 袁春杨大力
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products