Video description method based on dual-path fractal network and LSTM

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of video description and fractal network, applied in the field of video description and deep learning

Inactive Publication Date: 2017-07-07

SOUTH CHINA UNIV OF TECH

View PDF2 Cites 28 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, for a machine, it is a challenging task to generate a natural language description by extracting the pixel information of each frame image in the video, analyzing and processing it.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

$Video description method based on dual-path fractal network and LSTM$
$Video description method based on dual-path fractal network and LSTM$
$Video description method based on dual-path fractal network and LSTM$

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0054] Sampling the key frames of the video to be described, and extracting the optical flow features between two adjacent frames of the original video, and then learning and obtaining the high-level feature expressions of the key frames and optical flow features through two fractal networks, and then input them into two Based on the recurrent neural network model of the LSTM unit, the output values of the two independent recurrent neural network models at each moment are weighted and averaged to obtain the description sentence corresponding to the video.

[0055] figure 1 It is an overall flowchart of the present invention, comprising the following steps:

[0056] (1) Sampling the key frame of the video to be described, and extracting the optical flow features betwee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video description method based on a dual-path fractal network and an LSTM. According to the method, key frame sampling is carried out on a to-be-describe video and an optical flow characteristic between two adjacent frames of an original video is extracted; learning is carried out respectively by using two fractal networks and high-level feature expressions of the video frame and the optical flow characteristic are obtained; the high-level feature expressions are inputted into two LSTM-unit-based recurrent neural network models; and then weighted averaging is carried out on output values of two independent modules at all times to obtain a description statement corresponding to the video. According to the method provided by the invention, the original video frames and the optical flow information are used respectively for the to-be-describe video; the added optical flow characteristic compensates dynamic information that is lost by the sampling frame inevitably; and the changes of the video in the spatial dimension and the time dimension are considered. Besides, the abstract visual feature expression is carried out on the bottom feature by using the novel fractal networks, so that the person, object and behavior and space position relation that are involved in the video can be analyzed and dug out precisely.

Description

technical field [0001] The invention belongs to the technical field of video description and deep learning, and in particular relates to a video description method based on a two-way fractal network and LSTM. Background technique [0002] With the advancement of technology and the development of society, all kinds of video camera terminals, especially smart phones, have become very popular, and the price of hardware storage has become increasingly low, which makes the flow of multimedia information grow exponentially. In the face of a large number of video information streams, how to efficiently and automatically analyze, recognize and understand massive video information with minimal human intervention, and thus describe it semantically, has become a hot topic in the current image processing and computer vision research fields. topic. For most people, it may be a simple matter to watch a short video and then describe the video in words. However, for a machine, it is a cha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/02

CPCG06N3/02G06V20/46G06F18/214

Inventor 李楚怡袁东芝余卫宇胡丹

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video description method based on dual-path fractal network and LSTM

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A technology of video description and fractal network, applied in the field of video description and deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of video description and fractal network, applied in the field of video description and deep learning

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology