Multi-event video description method based on dynamic attention mechanism

A technology of video description and attention, applied in video data indexing, video data retrieval, metadata video data retrieval, etc., can solve problems such as low accuracy and poor parallelism

Active Publication Date: 2020-03-27
STATE GRID JIANGSU ELECTRIC POWER ENG CONSULTING CO LTD +3
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to solve the problems of poor parallelism and low accuracy in the existing dense video description generation algorithm, the present invention provides a multi-event video description method based on a dynamic attention mechanism to accurately locate and describe events in the video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-event video description method based on dynamic attention mechanism
  • Multi-event video description method based on dynamic attention mechanism
  • Multi-event video description method based on dynamic attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0060] Such as figure 1 , figure 2 with image 3 As shown, the present invention has designed a kind of multi-event video description method based on dynamic attention mechanism, and this method specifically comprises the following steps:

[0061] Step 1: adopt convolutional neural network (in the present embodiment, adopt 3D-CNN) to extract video sequence X={x 1 ,x 2 ,...x L }’s visual feature V={v 1 ,v 2 ,...v T }.

[0062] For a video sequence of L frames X={x 1 ,x 2 ,...x L }, using a 3DCNN pre-trained on the Sports-1M video dataset to extract features from its video frames. The temporal resolution of the extracted C3D features is δ=16 frames, so the input video stream can be discretized into T=L / δ steps, so the final feature sequence generated is V={v 1 ,v 2 ,...v T }.

[0063] Step 2: Input the visual feature V of the video to the L-layer self-attention video coding layer to obtain the coded representation of the video {F 1 , F 2 ,…...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-event video description method based on a dynamic attention mechanism. The method comprises the following steps: inputting a video sequence into a three-dimensional convolutional neural network, and extracting the visual features of a video; coding the visual features by adopting a video coding layer based on an attention mechanism, and inputting feature codes intoan event prediction layer; the event prediction layer predicts each event according to the video coding information; and the event description layer acquires visual features of each event according toan event prediction result, and dynamically generates text description of each event in combination with context information of the event description layer. According to the method, the defects of poor parallelism and low efficiency of the existing multi-event video description method are overcome, the accuracy of video description generation is ensured, and model training can be carried out in an end-to-end mode.

Description

technical field [0001] The invention relates to a multi-event video description method based on a dynamic attention mechanism, belonging to the field of video description in computer vision. Background technique [0002] Video tagging (VideoTagging) is a technology that analyzes video content and forms classification tags. Video tagging can effectively extract key information of videos and is widely used in the field of video storage and retrieval. But the video tag cannot show more detailed information of the video. Video Captioning (Video Captioning) is the process of automatically generating natural language descriptions of videos through computers. Through video descriptions, not only key elements in videos can be extracted, but also the relationship between these elements can be reflected through sentence descriptions. Therefore, video descriptions are included in video descriptions. Storage and retrieval, human-computer interaction, knowledge extraction and other fiel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/71G06F16/78G06F16/75G06N3/04
CPCG06F16/71G06F16/7867G06F16/75G06N3/045Y02T10/40
Inventor 谢洪平刘迪诸雅琴黄涛陈勇杜长青吴威王昊林东阳陈喆
Owner STATE GRID JIANGSU ELECTRIC POWER ENG CONSULTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products