Video content description method based on text auto-encoder

A self-encoder, video content technology, applied in the computer field, can solve the problems of ignoring the guiding role of updating, wasting computing resources, and inability to update weights accurately.

Active Publication Date: 2020-04-28
HANGZHOU DIANZI UNIV
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: First, the mainstream video description method mainly uses cross-entropy to calculate the loss, which has the disadvantage of error accumulation. Although reinforcement learning can be used to avoid this disadvantage, the calculation amount is large and it is difficult to converge; Second, the above method only considers the features of the video, and does not make full use of the rich features contained in the video text, ignoring the guiding role of the text as prior information on the update of the description model parameters; third, the recurrent neural network belongs to the sequential structure, and the current moment The unit depends on the output of all previous units and cannot be processed in parallel, resulting in a waste of computing resources. Sometimes the gradient disappears and the weights cannot be updated accurately, making it difficult to accurately generate coherent sentences that match the video content.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video content description method based on text auto-encoder
  • Video content description method based on text auto-encoder
  • Video content description method based on text auto-encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be further described below in conjunction with accompanying drawing.

[0050] A video content description method based on a text autoencoder, which focuses on building a text autoencoder to learn the corresponding latent space features and reconstructing the text using a multi-head attention residual network, which can generate a text description that is more in line with the real content of the video, fully Mining potential relationships between video content semantics and video textual descriptions. The self-attention network composed of self-attention modules and fully connected maps can effectively capture the long-term action sequence features in videos and improve the computational efficiency of the model, while enhancing the ability of neural networks to fit data (that is, using neural networks to fit text Hidden space feature matrix) to improve the quality of video content description; the use of multi-head attention residual netwo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video content description method based on a text auto-encoder. The method comprises the following steps: firstly, constructing a convolutional neural network to extract two-dimensional and three-dimensional features of a video; secondly, constructing a text auto-encoder, namely extracting text hidden space features and a decoder-multi-head attention residual error networkreconstruction text by using an encoder-text convolution network; thirdly, obtaining estimated text hidden space features through a self-attention mechanism and full-connection mapping; and finally,alternately optimizing the model through an adaptive moment estimation algorithm, and obtaining corresponding video content description for the new video by using the constructed text auto-encoder andthe convolutional neural network. According to the method, a potential relationship between video content semantics and video text description can be fully mined through the training of the text auto-encoder, and the action time sequence information of the video in a long time span is captured through the self-attention mechanism, so that the calculation efficiency of the model is improved, and the text description more conforming to the real content of the video is generated.

Description

technical field [0001] The invention belongs to the technical field of computers, in particular to the technical field of video content description, and relates to a video content description method based on a text autoencoder. Background technique [0002] In recent years, with the continuous development of information technology and the iterative upgrade of smart devices, people are more inclined to use video to convey information, which makes the scale of various types of video data larger and larger, but also brings great challenges. For example, hundreds of thousands of video data are uploaded to the server every minute on the video content sharing website. It is time-consuming and labor-intensive to manually review whether these videos comply with the rules, but the method of video description can significantly improve the review work. Efficiency, saving a lot of time and labor costs. Video content description technology can be widely used in practical scenarios such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V20/40G06N3/047G06N3/045G06F18/2415G06F18/241
Inventor 李平张致远徐向华
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products