A Bidirectional Reconstruction Network Video Description Method Based on Hierarchical Attention Mechanism

A network video and attention technology, applied in the computer field, can solve problems such as irrelevant background information, insufficient capture of video semantic information, and failure to consider the correlation of video frame regions, etc., to achieve the effect of reducing interference

Active Publication Date: 2021-04-20
南京赤马信息技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: first, the scale of extracting video frame features is single, and it is difficult to fully represent the rich video information; second, only consider the forward information propagation from video content to text description, there is no Consider the reverse information propagation from text description to video content, so that the semantic similarity between the generated text description and video content is not high; third, the correlation between the video frame region features and the generated text description is not considered, when the described object When it is small, it is easy to introduce irrelevant background information and interfere with the generation of text descriptions
Therefore, these methods are difficult to accurately describe video content and cannot fully capture video semantic information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Bidirectional Reconstruction Network Video Description Method Based on Hierarchical Attention Mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037]The present invention will be further described below in conjunction with accompanying drawing.

[0038] The two-way reconstruction network video description method based on the hierarchical attention mechanism focuses on extracting multi-scale video features to fully represent the temporal and spatial structure of the video, and at the same time uses the hierarchical attention mechanism to make the bidirectional reconstruction network model built pay more attention to the generated description sentences Most relevant video features. The main idea is to use convolutional neural network as an encoder to extract multi-scale regional features of video frames, and use hierarchical attention mechanism to process video features to obtain dynamic representation of video features; use long short-term memory neural network as decoder, minimize cross entropy The loss function obtains the probability distribution of vocabulary words and generates sentences accordingly; the reconstr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a bidirectional reconstruction network video description method based on a hierarchical attention mechanism. The method of the present invention first uses the convolutional neural network as an encoder to extract the multi-scale regional features of the video frame, and uses the hierarchical attention mechanism to process the video features to obtain a dynamic representation of the video features; secondly, uses the long-short-term memory neural network as The feature dynamic representation and its text description are input, and the probability distribution of vocabulary words is obtained by minimizing the cross-entropy loss function, and the generated sentences are obtained accordingly; again, by constructing a two-way reconstruction network with the hidden vector of the decoder as input, the minimum The reconstructed loss can output reconstructed video features, so that the generated text description and video content have high semantic similarity. The invention can effectively extract multi-scale video features to reflect the temporal and spatial structure of the video, reduce the interference of irrelevant information, mine potential video semantic information, and generate more accurate, natural and smooth video content descriptions.

Description

technical field [0001] The invention belongs to the technical field of computers, in particular to the technical field of video description in visual computing, and relates to a two-way reconstruction network video description method based on a hierarchical attention mechanism. Background technique [0002] In today's Internet era, smart devices such as live broadcast network platforms, video surveillance, and mobile phones generate a large amount of video data every day, and these data are showing explosive growth. It is time-consuming and labor-intensive to manually describe the content of these videos, and the field of video description was born. The video description method can be mainly used in practical application scenarios such as video title generation, video retrieval, and visually impaired people watching videos. [0003] The video description task is to describe the content of the video with a piece of text. Its goal is not only to capture the complex high-dimen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06N3/04
CPCG06N3/049G06V20/41G06V20/46G06N3/045
Inventor 李平张盼胡海洋徐向华
Owner 南京赤马信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products