Video Description Method for Semantic Reconstruction Based on Temporal Gaussian Mixture Atrous Convolution

A Gaussian mixture and video description technology, applied in the computer field, can solve problems such as gradient explosion, ignoring semantic differences at the sentence level, difficulty in effectively capturing long-term video sequence information, etc., to achieve the effect of narrowing semantic differences and reducing the amount of training parameters

Active Publication Date: 2022-03-22
HANGZHOU DIANZI UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: (1) Since LSTM still has the problem of gradient disappearance or gradient explosion, it is difficult to effectively capture long-term video sequence information, which is not conducive to learning the feature representation of video context; (2) natural sentences and Videos belong to two data modalities with different structures. It is difficult to accurately convert the semantics of video content into natural sentences. There is a semantic gap between generated sentences and video content. Existing methods often use the cross-entropy loss function to reduce the gap between generated sentences and videos from the word level. Semantic differences, while ignoring semantic differences at the statement level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video Description Method for Semantic Reconstruction Based on Temporal Gaussian Mixture Atrous Convolution
  • Video Description Method for Semantic Reconstruction Based on Temporal Gaussian Mixture Atrous Convolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with accompanying drawing.

[0040] Such as figure 1 , a semantic reconstruction video description method based on temporal Gaussian mixture atrous convolution. This method first uniformly samples the original video, uses convolutional neural network to extract appearance features and action features, and splicing according to the feature dimension to obtain video features; The Gaussian mixture hole convolution encoder obtains the temporal Gaussian video features; then the temporal Gaussian features and text description are input into the decoder, and the output is to generate the sentence probability distribution and hidden vector; then the semantic reconstruction network is established and the semantic reconstruction loss is calculated; using The stochastic gradient descent method optimizes the video description model composed of encoder, decoder and semantic reconstruction network; for new videos, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video description method for semantic reconstruction based on temporal Gaussian mixture hole convolution. The method of the present invention firstly extracts appearance features and action features from sampled video frames containing text descriptions, and then splicing them and inputting them into a time series Gaussian mixed hole convolution encoder to obtain time series Gaussian features; , get the probability distribution of generated sentences and hidden vectors; then build a semantic reconstruction network and calculate the semantic reconstruction loss; use the stochastic gradient descent algorithm to optimize the model, and obtain the probability distribution of generated sentences through the above steps for new videos, and use the greedy search algorithm to obtain video Descriptive statement. The method of the present invention uses time-series Gaussian mixed hole convolution to model the long-term time-series relationship of videos, and obtains the sentence-level probability distribution difference through the semantic reconstruction network, which can narrow the semantic gap between generated sentences and video content, thereby generating a more accurate description of the video A natural sentence for the content.

Description

technical field [0001] The invention belongs to the field of computer technology, in particular to the field of video description in computer vision, and relates to a semantic reconstruction video description method based on temporal Gaussian mixture hole convolution. Background technique [0002] The rapid development of the Internet has produced a variety of multimedia data resources, such as video, image, audio and text. In recent years, with the popularity of smart terminals such as mobile phones and cameras and the substantial increase in Internet bandwidth, video platforms such as Douyin and Kuaishou have become popular among users. The webcast and self-media industries have risen rapidly, and tens of thousands of videos are uploaded every day. Generation and dissemination, the number of videos has shown explosive growth, which has had a great impact on people's daily life. In the era of big data, how to effectively use massive video is very important. Compared with d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/71G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06F16/71G06N3/08G06N3/045G06F18/2411G06F18/2415
Inventor 李平张盼蒋昕怡徐向华
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products