A Video Content Description Method Based on Text Autoencoder

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-encoder and video content technology, applied in the computer field, can solve problems such as not making full use of rich features, wasting computing resources, ignoring the guiding role of updates, etc., to reduce training difficulty and model construction overhead, and enhance fitting data. ability, the effect of improving content description quality

Active Publication Date: 2021-07-13

HANGZHOU DIANZI UNIV

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: First, the mainstream video description method mainly uses cross-entropy to calculate the loss, which has the disadvantage of error accumulation. Although reinforcement learning can be used to avoid this disadvantage, the calculation amount is large and it is difficult to converge; Second, the above method only considers the features of the video, and does not make full use of the rich features contained in the video text, ignoring the guiding role of the text as prior information on the update of the description model parameters; third, the recurrent neural network belongs to the sequential structure, and the current moment The unit depends on the output of all previous units and cannot be processed in parallel, resulting in a waste of computing resources. Sometimes the gradient disappears and the weights cannot be updated accurately, making it difficult to accurately generate coherent sentences that match the video content.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with accompanying drawing.

[0049] A video content description method based on a text autoencoder, which focuses on building a text autoencoder to learn the corresponding latent space features and reconstructing the text using a multi-head attention residual network, which can generate a text description that is more in line with the real content of the video, fully Mining potential relationships between video content semantics and video textual descriptions. The self-attention network composed of self-attention modules and fully connected maps can effectively capture the long-term action sequence features in videos and improve the computational efficiency of the model, while enhancing the ability of neural networks to fit data (that is, using neural networks to fit text Hidden space feature matrix) to improve the quality of video content description; the use of multi-head attention residual netwo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video content description method based on a text self-encoder. The method of the present invention first constructs the two-dimensional and three-dimensional features of the convolutional neural network to extract the video; secondly, constructs the text self-encoder, that is, uses the encoder-text convolution network to extract the text latent space feature and decoder-multi-head attention residual respectively The network reconstructs the text; again, the estimated text latent space features are obtained through the self-attention mechanism and the full connection mapping; finally, the above-mentioned model is alternately optimized through the adaptive moment estimation algorithm, and the constructed text autoencoder and convolution are used for the new video The neural network obtains the corresponding video content description. The method of the present invention can fully tap the potential relationship between video content semantics and video text description through the training of the text autoencoder, and capture the action sequence information of the long-term span of the video through the self-attention mechanism, which improves the calculation efficiency of the model, thereby generating a more consistent A textual description of the real content of the video.

Description

technical field [0001] The invention belongs to the technical field of computers, in particular to the technical field of video content description, and relates to a video content description method based on a text autoencoder. Background technique [0002] In recent years, with the continuous development of information technology and the iterative upgrade of smart devices, people are more inclined to use video to convey information, which makes the scale of various types of video data larger and larger, but also brings great challenges. For example, hundreds of thousands of video data are uploaded to the server every minute on the video content sharing website. It is time-consuming and labor-intensive to manually review whether these videos comply with the rules, but the method of video description can significantly improve the review work. Efficiency, saving a lot of time and labor costs. Video content description technology can be widely used in practical scenarios such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06V20/40G06N3/047G06N3/045G06F18/2415G06F18/241

Inventor 李平张致远徐向华

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Video Content Description Method Based on Text Autoencoder

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology