Video content description method based on text auto-encoder

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A self-encoder, video content technology, applied in the computer field, can solve the problems of ignoring the guiding role of updating, wasting computing resources, and inability to update weights accurately.

Active Publication Date: 2020-04-28

HANGZHOU DIANZI UNIV

View PDF5 Cites 21 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: First, the mainstream video description method mainly uses cross-entropy to calculate the loss, which has the disadvantage of error accumulation. Although reinforcement learning can be used to avoid this disadvantage, the calculation amount is large and it is difficult to converge; Second, the above method only considers the features of the video, and does not make full use of the rich features contained in the video text, ignoring the guiding role of the text as prior information on the update of the description model parameters; third, the recurrent neural network belongs to the sequential structure, and the current moment The unit depends on the output of all previous units and cannot be processed in parallel, resulting in a waste of computing resources. Sometimes the gradient disappears and the weights cannot be updated accurately, making it difficult to accurately generate coherent sentences that match the video content.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The present invention will be further described below in conjunction with accompanying drawing.

[0050] A video content description method based on a text autoencoder, which focuses on building a text autoencoder to learn the corresponding latent space features and reconstructing the text using a multi-head attention residual network, which can generate a text description that is more in line with the real content of the video, fully Mining potential relationships between video content semantics and video textual descriptions. The self-attention network composed of self-attention modules and fully connected maps can effectively capture the long-term action sequence features in videos and improve the computational efficiency of the model, while enhancing the ability of neural networks to fit data (that is, using neural networks to fit text Hidden space feature matrix) to improve the quality of video content description; the use of multi-head attention residual netwo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video content description method based on a text auto-encoder. The method comprises the following steps: firstly, constructing a convolutional neural network to extract two-dimensional and three-dimensional features of a video; secondly, constructing a text auto-encoder, namely extracting text hidden space features and a decoder-multi-head attention residual error networkreconstruction text by using an encoder-text convolution network; thirdly, obtaining estimated text hidden space features through a self-attention mechanism and full-connection mapping; and finally,alternately optimizing the model through an adaptive moment estimation algorithm, and obtaining corresponding video content description for the new video by using the constructed text auto-encoder andthe convolutional neural network. According to the method, a potential relationship between video content semantics and video text description can be fully mined through the training of the text auto-encoder, and the action time sequence information of the video in a long time span is captured through the self-attention mechanism, so that the calculation efficiency of the model is improved, and the text description more conforming to the real content of the video is generated.

Description

technical field [0001] The invention belongs to the technical field of computers, in particular to the technical field of video content description, and relates to a video content description method based on a text autoencoder. Background technique [0002] In recent years, with the continuous development of information technology and the iterative upgrade of smart devices, people are more inclined to use video to convey information, which makes the scale of various types of video data larger and larger, but also brings great challenges. For example, hundreds of thousands of video data are uploaded to the server every minute on the video content sharing website. It is time-consuming and labor-intensive to manually review whether these videos comply with the rules, but the method of video description can significantly improve the review work. Efficiency, saving a lot of time and labor costs. Video content description technology can be widely used in practical scenarios such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06V20/40G06N3/047G06N3/045G06F18/2415G06F18/241

Inventor 李平张致远徐向华

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video content description method based on text auto-encoder

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology