Video content description method by means of space and time attention models

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An attention model and video content technology, applied in neural learning methods, biological neural network models, character and pattern recognition, etc., can solve problems such as losing and ignoring key information

Active Publication Date: 2017-08-18

HANGZHOU DIANZI UNIV

View PDF7 Cites 90 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] In order to overcome the problem of ignoring some key information in the existing video content description method due to the loss of the spatial structure in each frame of the picture, and to further improve the accuracy of the description, the present invention intends to add spatial Attention Model, a new method for video content description using spatio-temporal attention model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0096] Combine below figure 2 , giving video content description specific examples of training and testing implementation, the detailed calculation process is as follows:

[0097] (1) There are a total of 430 frames in a certain section of video. First, the video format is preprocessed, and the video to be described is converted into a set of pictures with intervals of 43 frames according to 10% of the frame rate;

[0098] (2) Use the pre-trained convolutional neural network GoogLeNet, Faster R-CNN and C3D to extract the global features, local features and dynamic features of the entire video in 43 pictures, and use the cascade method according to the formula (1 ) the methods listed in ) combine the global features and dynamics;

[0099] (3) According to the methods listed in formulas (2)-(5), calculate the spatial representation of the local features on each frame of the picture

[0100] (4) According to the methods listed in formulas (8)-(13), respectively c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video content description method by means of space and time attention models. The global time structure in the video is captured by means of the time attention model, the space structure on each frame of picture is captured by the space attention model, and the method aims to realize that the video description model masters the main event in the video and enhances the identification capability of the local information. The method includes preprocessing the video format; establishing the time and space attention models; and training and testing the video description model. By means of the time attention model, the main time structure in the video can be maintained, and by means of the space attention model, some key areas in each frame of picture are focused in each frame of picture, so that the generated video description can capture some key buy easy neglected detailed information while mastering the main event in the video content.

Description

technical field [0001] The invention belongs to the technical field of computer vision and natural language processing, and relates to a video content description method using a spatio-temporal attention model. Background technique [0002] Previous research work on video content description is mainly divided into the following aspects: [0003] 1. A method based on feature recognition and language template filling. Specifically, the method is divided into two steps. First, the video is converted into an image collection with continuous frames according to a certain time interval; second, a series of feature classifiers pre-trained in a large-scale image training set are used to convert The static features and dynamic features in the video are classified and marked. Specifically, these features can be subdivided into entities, entity attributes, interactive relationships between entities, and scenes, etc.; finally, according to the characteristics of human language, a "subj...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/084G06V20/46G06N3/044G06N3/045G06F18/214

Inventor 涂云斌颜成钢张曦珊

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video content description method by means of space and time attention models

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology