Multi-event video description method based on dynamic attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of video description and attention, applied in video data indexing, video data retrieval, metadata video data retrieval, etc., can solve problems such as low accuracy and poor parallelism

Active Publication Date: 2020-03-27

STATE GRID JIANGSU ELECTRIC POWER ENG CONSULTING CO LTD +3

View PDF4 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] In order to solve the problems of poor parallelism and low accuracy in the existing dense video description generation algorithm, the present invention provides a multi-event video description method based on a dynamic attention mechanism to accurately locate and describe events in the video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0060] Such as figure 1 , figure 2 with image 3 As shown, the present invention has designed a kind of multi-event video description method based on dynamic attention mechanism, and this method specifically comprises the following steps:

[0061] Step 1: adopt convolutional neural network (in the present embodiment, adopt 3D-CNN) to extract video sequence X={x 1 ,x 2 ,...x L }’s visual feature V={v 1 ,v 2 ,...v T }.

[0062] For a video sequence of L frames X={x 1 ,x 2 ,...x L }, using a 3DCNN pre-trained on the Sports-1M video dataset to extract features from its video frames. The temporal resolution of the extracted C3D features is δ=16 frames, so the input video stream can be discretized into T=L / δ steps, so the final feature sequence generated is V={v 1 ,v 2 ,...v T }.

[0063] Step 2: Input the visual feature V of the video to the L-layer self-attention video coding layer to obtain the coded representation of the video {F 1 , F 2 ,…...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-event video description method based on a dynamic attention mechanism. The method comprises the following steps: inputting a video sequence into a three-dimensional convolutional neural network, and extracting the visual features of a video; coding the visual features by adopting a video coding layer based on an attention mechanism, and inputting feature codes intoan event prediction layer; the event prediction layer predicts each event according to the video coding information; and the event description layer acquires visual features of each event according toan event prediction result, and dynamically generates text description of each event in combination with context information of the event description layer. According to the method, the defects of poor parallelism and low efficiency of the existing multi-event video description method are overcome, the accuracy of video description generation is ensured, and model training can be carried out in an end-to-end mode.

Description

technical field [0001] The invention relates to a multi-event video description method based on a dynamic attention mechanism, belonging to the field of video description in computer vision. Background technique [0002] Video tagging (VideoTagging) is a technology that analyzes video content and forms classification tags. Video tagging can effectively extract key information of videos and is widely used in the field of video storage and retrieval. But the video tag cannot show more detailed information of the video. Video Captioning (Video Captioning) is the process of automatically generating natural language descriptions of videos through computers. Through video descriptions, not only key elements in videos can be extracted, but also the relationship between these elements can be reflected through sentence descriptions. Therefore, video descriptions are included in video descriptions. Storage and retrieval, human-computer interaction, knowledge extraction and other fiel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/71G06F16/78G06F16/75G06N3/04

CPCG06F16/71G06F16/7867G06F16/75G06N3/045Y02T10/40

Inventor 谢洪平刘迪诸雅琴黄涛陈勇杜长青吴威王昊林东阳陈喆

Owner STATE GRID JIANGSU ELECTRIC POWER ENG CONSULTING CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-event video description method based on dynamic attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology