Full convolution video description generation method based on self-optimization mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video description and self-optimization technology, applied in the field of cross-media generation learning, can solve problems such as the difficulty of recurrent neural network training, the inability to parallelize recurrent neural networks, and the long gradient transmission path, so as to improve usability and user experience, and describe content in natural language Rich, Fast-Training Effects

Active Publication Date: 2020-07-28

FUDAN UNIV

View PDF17 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The cyclic neural network has a good effect on the serialization task, but its computing unit is very complex. Due to the timing expansion, the gradient transfer path in the cyclic neural network is very long, and because the computing unit of the cyclic network is very complex and at each moment The output of the previous moment is required as the current input, causing the recurrent neural network to be unable to parallelize during training

The above problems make the training of the cyclic neural network very difficult and require a lot of time, which makes researchers look for a model structure that does not use the cyclic network to solve the serialization problem, and has made great breakthroughs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] The following is a detailed introduction to the fully convolutional video description generation method based on the self-optimization mechanism with reference to the accompanying drawings.

[0047] as attached figure 1 Shown, concrete steps of the present invention comprise:

[0048] Step 1. Collect the required video data from the multimedia data set, and obtain the video and the marked video description.

[0049] In step 1, there are usually multiple natural language descriptions corresponding to a video, and the tag words that are infrequent or useless in the entire data set are sorted out. The steps of sorting are as follows:

[0050] Step 1.1: Count the frequency of all words in the data set annotation in the data set;

[0051] Step 1.2: Filter out meaningless words with numbers in those words;

[0052] Step 1.3: For each image annotation, words that appear less frequently in the entire dataset are considered as relatively minor information in the image and del...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention particularly relates to a video-oriented cross-modal video description generation method. The method comprises three main algorithm parts, namely video content understanding, significantvisual information acquisition and natural language description generation. According to the method, a novel convolutional neural network is used as a basic model to generate natural language description for a video data set, a traditional recurrent neural network is not used, a brand-new attention mechanism is designed according to the characteristics of a stacked structure, the relevancy between a current vocabulary and visual representation is calculated, and meanwhile the most critical visual information at each moment is obtained. Compared with a traditional video description generationmethod, the method has the advantages that the concerned visual information is more accurate, and the generated natural language description is more accurate and conforms to a daily expression mode. The method is of great significance to video understanding and expression considering the multi-modal information between the video and the text, can improve the understanding ability of the model to the visual information and enhance the user experience, and has a wide application value in the field of cross-media information understanding.

Description

technical field [0001] The invention belongs to the technical field of cross-media generation and learning, and in particular relates to a method for generating fully convolutional video descriptions based on a self-optimization mechanism. [0002] technical background [0003] With the development of communication and storage technology, the video data in the network is increasing continuously. Compared with images and text, video contains more information and is easier to understand, which makes video a better information carrier in many cases. While understanding video is easy for humans, it is difficult for computers to do the job. Video Captioning is a very important visual understanding task, which is to generate a natural language description for the provided video to describe the main information of the video, so that the semantic information contained in the video can be understood very concisely through the natural language description . The video description gen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): H04N21/84H04N21/44G06N3/04

CPCH04N21/84H04N21/44008G06N3/045

Inventor 张玥杰房琨城周练张涛

Owner FUDAN UNIV

Full convolution video description generation method based on self-optimization mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology