Unlock instant, AI-driven research and patent intelligence for your innovation.

Transform and deep reinforcement learning-based video abstract generation network

A technology of reinforcement learning and video summarization, applied in the field of computer vision, can solve the problem that the summaries do not have temporal coherence

Pending Publication Date: 2022-06-21
武光利 +3
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The biggest disadvantage of static summaries is that the synthesized summaries do not have temporal coherence, giving people a fast-forward feeling, while dynamic summaries combine shots to retain visual coherence without losing key content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transform and deep reinforcement learning-based video abstract generation network
  • Transform and deep reinforcement learning-based video abstract generation network
  • Transform and deep reinforcement learning-based video abstract generation network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to enable those skilled in the art to better understand the present invention, the technical solutions of the present invention are further described below with reference to the accompanying drawings and embodiments.

[0048] like figure 1 As shown, the video summary generation network based on Transformer and deep reinforcement learning includes three parts: encoding, decoding, and optimization;

[0049] The coding part extracts the depth features of the video frame through GooLeNet, and inputs the feature vector into the Transformer coding part. First, position coding is performed, and then passed to the self-attention layer. After the calculation is completed, the residual connection and layer regularization are performed, and finally the feedforward neural network is passed. and one more residual connection and layer regularization;

[0050] After the depth feature is extracted from the video frame through GooLeNet, assuming there are M frames in total, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

According to the video abstraction technology, key frames or fragments of an original video are extracted to generate an abstraction, and the abstraction can greatly shorten watching time on the basis that main content is not lost, so that the effect of quick browsing is achieved. Most of existing methods are improved only based on image features, the time sequence between images is neglected, and meanwhile a model lacks the autonomous learning ability. The invention provides a video abstract network taking Transform and deep reinforcement learning as a framework, the network takes an encoder-decoder of the Transform as a main structure, the encoder part is composed of two modules, namely, a self-attention module and a Feed Forward Neural Network module in the Transform, and the encoder part of the Transform is replaced by BiLSTM (Bilong Short Term Memory) and reinforcement learning. The experiment is carried out on two public standard data sets of the video abstract, and the experiment result proves the effectiveness of the method. The encoder part of the Transform has an excellent processing capability for image features, and the BiLSTM in the decoder has a good decoding capability for time sequence data.

Description

technical field [0001] The invention relates to a video summary generation network based on Transformer and deep reinforcement learning, and belongs to the technical field of computer vision. Background technique [0002] With the development of Internet technology and the advancement of mobile communication equipment, the field of online video has developed by leaps and bounds. "China Internet Development Report (2021)" shows that in 2020, the scale of China's online video market will reach 241.2 billion yuan, a year-on-year increase of 44%, and the number of online video active users will reach 1.001 billion, a year-on-year increase of 2.14%, bringing huge opportunities and welcoming many challenges. Video data is not only large in quantity but also in various types, such as videos shot by users, short videos, surveillance videos and news videos, etc., which makes it more difficult to review online video content, and users' demand for fast video browsing is increasing day...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/738G06F16/783G06N3/04G06N3/08
CPCG06F16/739G06F16/783G06F16/7847G06N3/08G06N3/045G06N3/044
Inventor 武光利李雷霆张静牛君会
Owner 武光利