Video Description Method for Semantic Reconstruction Based on Temporal Gaussian Mixture Atrous Convolution

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A Gaussian mixture and video description technology, applied in the computer field, can solve problems such as gradient explosion, ignoring semantic differences at the sentence level, difficulty in effectively capturing long-term video sequence information, etc., to achieve the effect of narrowing semantic differences and reducing the amount of training parameters

Active Publication Date: 2022-03-22

HANGZHOU DIANZI UNIV

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The shortcomings of the above methods are mainly manifested in the following aspects: (1) Since LSTM still has the problem of gradient disappearance or gradient explosion, it is difficult to effectively capture long-term video sequence information, which is not conducive to learning the feature representation of video context; (2) natural sentences and Videos belong to two data modalities with different structures. It is difficult to accurately convert the semantics of video content into natural sentences. There is a semantic gap between generated sentences and video content. Existing methods often use the cross-entropy loss function to reduce the gap between generated sentences and videos from the word level. Semantic differences, while ignoring semantic differences at the statement level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with accompanying drawing.

[0040] Such as figure 1 , a semantic reconstruction video description method based on temporal Gaussian mixture atrous convolution. This method first uniformly samples the original video, uses convolutional neural network to extract appearance features and action features, and splicing according to the feature dimension to obtain video features; The Gaussian mixture hole convolution encoder obtains the temporal Gaussian video features; then the temporal Gaussian features and text description are input into the decoder, and the output is to generate the sentence probability distribution and hidden vector; then the semantic reconstruction network is established and the semantic reconstruction loss is calculated; using The stochastic gradient descent method optimizes the video description model composed of encoder, decoder and semantic reconstruction network; for new videos, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video description method for semantic reconstruction based on temporal Gaussian mixture hole convolution. The method of the present invention firstly extracts appearance features and action features from sampled video frames containing text descriptions, and then splicing them and inputting them into a time series Gaussian mixed hole convolution encoder to obtain time series Gaussian features; , get the probability distribution of generated sentences and hidden vectors; then build a semantic reconstruction network and calculate the semantic reconstruction loss; use the stochastic gradient descent algorithm to optimize the model, and obtain the probability distribution of generated sentences through the above steps for new videos, and use the greedy search algorithm to obtain video Descriptive statement. The method of the present invention uses time-series Gaussian mixed hole convolution to model the long-term time-series relationship of videos, and obtains the sentence-level probability distribution difference through the semantic reconstruction network, which can narrow the semantic gap between generated sentences and video content, thereby generating a more accurate description of the video A natural sentence for the content.

Description

technical field [0001] The invention belongs to the field of computer technology, in particular to the field of video description in computer vision, and relates to a semantic reconstruction video description method based on temporal Gaussian mixture hole convolution. Background technique [0002] The rapid development of the Internet has produced a variety of multimedia data resources, such as video, image, audio and text. In recent years, with the popularity of smart terminals such as mobile phones and cameras and the substantial increase in Internet bandwidth, video platforms such as Douyin and Kuaishou have become popular among users. The webcast and self-media industries have risen rapidly, and tens of thousands of videos are uploaded every day. Generation and dissemination, the number of videos has shown explosive growth, which has had a great impact on people's daily life. In the era of big data, how to effectively use massive video is very important. Compared with d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/71G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06F16/71G06N3/08G06N3/045G06F18/2411G06F18/2415

Inventor 李平张盼蒋昕怡徐向华

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video Description Method for Semantic Reconstruction Based on Temporal Gaussian Mixture Atrous Convolution

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology