Video content description method by means of space and time attention models

An attention model and video content technology, applied in neural learning methods, biological neural network models, character and pattern recognition, etc., can solve problems such as losing and ignoring key information

Active Publication Date: 2017-08-18
HANGZHOU DIANZI UNIV
View PDF7 Cites 90 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to overcome the problem of ignoring some key information in the existing video content description method due to the loss of the spatial structure in each frame of the picture, and to further improve the accuracy of the description, the present invention intends to add spatial Attention Model, a new method for video content description using spatio-temporal attention model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video content description method by means of space and time attention models
  • Video content description method by means of space and time attention models
  • Video content description method by means of space and time attention models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0096] Combine below figure 2 , giving video content description specific examples of training and testing implementation, the detailed calculation process is as follows:

[0097] (1) There are a total of 430 frames in a certain section of video. First, the video format is preprocessed, and the video to be described is converted into a set of pictures with intervals of 43 frames according to 10% of the frame rate;

[0098] (2) Use the pre-trained convolutional neural network GoogLeNet, Faster R-CNN and C3D to extract the global features, local features and dynamic features of the entire video in 43 pictures, and use the cascade method according to the formula (1 ) the methods listed in ) combine the global features and dynamics;

[0099] (3) According to the methods listed in formulas (2)-(5), calculate the spatial representation of the local features on each frame of the picture

[0100] (4) According to the methods listed in formulas (8)-(13), respectively c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video content description method by means of space and time attention models. The global time structure in the video is captured by means of the time attention model, the space structure on each frame of picture is captured by the space attention model, and the method aims to realize that the video description model masters the main event in the video and enhances the identification capability of the local information. The method includes preprocessing the video format; establishing the time and space attention models; and training and testing the video description model. By means of the time attention model, the main time structure in the video can be maintained, and by means of the space attention model, some key areas in each frame of picture are focused in each frame of picture, so that the generated video description can capture some key buy easy neglected detailed information while mastering the main event in the video content.

Description

technical field [0001] The invention belongs to the technical field of computer vision and natural language processing, and relates to a video content description method using a spatio-temporal attention model. Background technique [0002] Previous research work on video content description is mainly divided into the following aspects: [0003] 1. A method based on feature recognition and language template filling. Specifically, the method is divided into two steps. First, the video is converted into an image collection with continuous frames according to a certain time interval; second, a series of feature classifiers pre-trained in a large-scale image training set are used to convert The static features and dynamic features in the video are classified and marked. Specifically, these features can be subdivided into entities, entity attributes, interactive relationships between entities, and scenes, etc.; finally, according to the characteristics of human language, a "subj...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06V20/46G06N3/044G06N3/045G06F18/214
Inventor 涂云斌颜成钢张曦珊
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products