Video description method based on high-order low-rank multi-modal attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video description, multi-modal technology, applied in the field of computer vision, can solve the problems of ignoring multi-modal feature correlation information, the impact of video description accuracy, etc., to achieve good application value, improve efficiency, and improve accuracy.

Active Publication Date: 2020-02-21

ZHEJIANG UNIV

View PDF7 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The decoder generally uses a separate cyclic neural network combined with an attention mechanism, but the current attention mechanism ignores the correlation information between multi-modal features, which will affect the accuracy of video description

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047]In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0048] On the contrary, the invention covers any alternatives, modifications, equivalent methods and schemes within the spirit and scope of the invention as defined by the claims. Further, in order to make the public have a better understanding of the present invention, some specific details are described in detail in the detailed description of the present invention below. The present invention can be fully understood by those skilled in the art without the description of these detailed parts.

[0049] refer to figure 1 , in a preferred embodiment of the present invention, the video description g...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video description method based on a high-order low-rank multi-modal attention mechanism, which is used for generating short and accurate description for a given video clip. The method specifically comprises the following steps: obtaining a video data set for training a video description generation model, and defining an algorithm target; modeling time sequence multi-modalfeatures in the video data set; establishing a high-order low-rank multi-modal attention mechanism on a decoder based on the time sequence multi-modal characteristics; generating a description of aninput video using the model. The method is suitable for video description generation of a real video scene, and has better effect and robustness for various complex conditions.

Description

technical field [0001] The invention belongs to the field of computer vision, in particular to a video description method based on a high-order low-rank multi-modal attention mechanism. Background technique [0002] In today's society, video has become an indispensable part of human society, it can be said that it is everywhere. Such an environment has made people's research on the semantic content of video has also been greatly developed. At present, most of the research on video is mainly concentrated on lower levels, such as classification, detection and so on. Thanks to the development of recurrent neural networks, the new task of video description generation has also come into view. Given a video clip, use the trained network model to automatically generate a sentence description for the video clip. Its application in the real world is also very extensive. For example, about 100 hours of videos are generated every minute on YouTube. If the generated video resources ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06N3/04G06N3/08

CPCG06N3/08G06V20/41G06N3/045

Inventor 金涛李英明张仲非

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video description method based on high-order low-rank multi-modal attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology