Video description method based on target space semantic alignment

A video description and target technology, applied in the field of computer vision, can solve the problems of high time and space complexity of the attention mechanism, ignoring the corresponding relationship, difficult real-time performance, etc., and achieve the goal of improving operating efficiency, improving accuracy, and increasing accuracy Effect

Active Publication Date: 2022-03-08
HANGZHOU DIANZI UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The above video description methods mainly have the following deficiencies: (1) When extracting features, only two-dimensional static features and three-dimensional dynamic features of the video are considered, and the relationship between target objects in the video is not fully considered, which often leads to the description of the relationship between different targets Semantic confusion occurs in sentences, such as combining two unrelated target words; (2) When using the attention mechanism, the correspondence between words and video frames is often considered, while the relationship between words and target objects in video frames is ignored. The corresponding relationship leads to the description of the target object irrelevant in the generated sentence; (3) the time and space complexity of the traditional attention mechanism is too high, which is quadratically proportional to the number of video frames, and it is difficult to be used in actual tasks with high real-time performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video description method based on target space semantic alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The present invention will be further described below in conjunction with accompanying drawing.

[0063] Such as figure 1 , a video description method based on target space semantic alignment, first uniformly sample the video, extract its video feature vector, target feature vector and mask set; then input the video mask set into the target space adjacency module, which can Obtain the target adjacency matrix; use the target adjacency matrix and the target feature vector to jointly construct the target adjacency feature, and use the word selection module to obtain the word candidate set; input the target adjacency feature vector, video feature vector and candidate word set into the target semantics The alignment module realizes semantic alignment; after obtaining the semantic alignment vector, input it into the attention-language memory module to realize the generation of the final sentence. This method can not only capture the object spatial relationship, but also achi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video description method based on target space semantic alignment. According to the method, firstly, appearance features and action features are extracted from sampled video frames containing text description, and the appearance features and the action features are spliced and then input into a time sequence Gaussian mixture cavity convolution encoder to obtain time sequence Gaussian features; constructing a decoder by using two layers of long and short-term memory neural networks to obtain probability distribution and hidden vectors of generated statements; establishing a semantic reconstruction network and calculating semantic reconstruction loss; and optimizing the model by using a stochastic gradient descent algorithm, sequentially carrying out the steps on a new video to obtain statement generation probability distribution, and obtaining a video description statement by using a greedy search algorithm. According to the method, modeling is carried out on the long-term time sequence relation of the video by using the time sequence Gaussian mixture cavity convolution, and the statement-level probability distribution difference is obtained through the semantic reconstruction network, so that the semantic gap between the generated statement and the video content can be reduced, and the natural statement which more accurately describes the video content is generated.

Description

technical field [0001] The invention belongs to the technical field of computer vision, in particular to the field of visual content understanding and analysis, and relates to a video description method based on target space semantic alignment. Background technique [0002] In the era of Internet + 2.0, people, machines, and things are closely connected by the network and edge devices, and information transmission is very important. As a multimedia data form with an increasing share, video contains richer visual features than text images. Accurate understanding of video content has become an urgent need for various practical applications such as video surveillance, autonomous driving, and navigation for the visually impaired. Describing video content in natural language that is easy for humans to understand is an important research direction for visual understanding, which is called video description. [0003] The video description task is to describe the video content with ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/78G06N3/04G06N3/08
CPCG06F16/7867G06N3/08G06N3/047G06N3/045
Inventor 李平王涛李佳晖徐向华
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products