Transform and deep reinforcement learning-based video abstract generation network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and video summarization, applied in the field of computer vision, can solve the problem that the summaries do not have temporal coherence

Pending Publication Date: 2022-06-21

武光利 +3

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The biggest disadvantage of static summaries is that the synthesized summaries do not have temporal coherence, giving people a fast-forward feeling, while dynamic summaries combine shots to retain visual coherence without losing key content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] In order to enable those skilled in the art to better understand the present invention, the technical solutions of the present invention are further described below with reference to the accompanying drawings and embodiments.

[0048] like figure 1 As shown, the video summary generation network based on Transformer and deep reinforcement learning includes three parts: encoding, decoding, and optimization;

[0049] The coding part extracts the depth features of the video frame through GooLeNet, and inputs the feature vector into the Transformer coding part. First, position coding is performed, and then passed to the self-attention layer. After the calculation is completed, the residual connection and layer regularization are performed, and finally the feedforward neural network is passed. and one more residual connection and layer regularization;

[0050] After the depth feature is extracted from the video frame through GooLeNet, assuming there are M frames in total, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

According to the video abstraction technology, key frames or fragments of an original video are extracted to generate an abstraction, and the abstraction can greatly shorten watching time on the basis that main content is not lost, so that the effect of quick browsing is achieved. Most of existing methods are improved only based on image features, the time sequence between images is neglected, and meanwhile a model lacks the autonomous learning ability. The invention provides a video abstract network taking Transform and deep reinforcement learning as a framework, the network takes an encoder-decoder of the Transform as a main structure, the encoder part is composed of two modules, namely, a self-attention module and a Feed Forward Neural Network module in the Transform, and the encoder part of the Transform is replaced by BiLSTM (Bilong Short Term Memory) and reinforcement learning. The experiment is carried out on two public standard data sets of the video abstract, and the experiment result proves the effectiveness of the method. The encoder part of the Transform has an excellent processing capability for image features, and the BiLSTM in the decoder has a good decoding capability for time sequence data.

Description

technical field [0001] The invention relates to a video summary generation network based on Transformer and deep reinforcement learning, and belongs to the technical field of computer vision. Background technique [0002] With the development of Internet technology and the advancement of mobile communication equipment, the field of online video has developed by leaps and bounds. "China Internet Development Report (2021)" shows that in 2020, the scale of China's online video market will reach 241.2 billion yuan, a year-on-year increase of 44%, and the number of online video active users will reach 1.001 billion, a year-on-year increase of 2.14%, bringing huge opportunities and welcoming many challenges. Video data is not only large in quantity but also in various types, such as videos shot by users, short videos, surveillance videos and news videos, etc., which makes it more difficult to review online video content, and users' demand for fast video browsing is increasing day...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/738G06F16/783G06N3/04G06N3/08

CPCG06F16/739G06F16/783G06F16/7847G06N3/08G06N3/045G06N3/044

Inventor 武光利李雷霆张静牛君会

Owner 武光利

Transform and deep reinforcement learning-based video abstract generation network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology