Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video description method based on dual-path fractal network and LSTM

A technology of video description and fractal network, applied in the field of video description and deep learning

Inactive Publication Date: 2017-07-07
SOUTH CHINA UNIV OF TECH
View PDF2 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for a machine, it is a challenging task to generate a natural language description by extracting the pixel information of each frame image in the video, analyzing and processing it.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video description method based on dual-path fractal network and LSTM
  • Video description method based on dual-path fractal network and LSTM
  • Video description method based on dual-path fractal network and LSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0054] Sampling the key frames of the video to be described, and extracting the optical flow features between two adjacent frames of the original video, and then learning and obtaining the high-level feature expressions of the key frames and optical flow features through two fractal networks, and then input them into two Based on the recurrent neural network model of the LSTM unit, the output values ​​of the two independent recurrent neural network models at each moment are weighted and averaged to obtain the description sentence corresponding to the video.

[0055] figure 1 It is an overall flowchart of the present invention, comprising the following steps:

[0056] (1) Sampling the key frame of the video to be described, and extracting the optical flow features betwee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video description method based on a dual-path fractal network and an LSTM. According to the method, key frame sampling is carried out on a to-be-describe video and an optical flow characteristic between two adjacent frames of an original video is extracted; learning is carried out respectively by using two fractal networks and high-level feature expressions of the video frame and the optical flow characteristic are obtained; the high-level feature expressions are inputted into two LSTM-unit-based recurrent neural network models; and then weighted averaging is carried out on output values of two independent modules at all times to obtain a description statement corresponding to the video. According to the method provided by the invention, the original video frames and the optical flow information are used respectively for the to-be-describe video; the added optical flow characteristic compensates dynamic information that is lost by the sampling frame inevitably; and the changes of the video in the spatial dimension and the time dimension are considered. Besides, the abstract visual feature expression is carried out on the bottom feature by using the novel fractal networks, so that the person, object and behavior and space position relation that are involved in the video can be analyzed and dug out precisely.

Description

technical field [0001] The invention belongs to the technical field of video description and deep learning, and in particular relates to a video description method based on a two-way fractal network and LSTM. Background technique [0002] With the advancement of technology and the development of society, all kinds of video camera terminals, especially smart phones, have become very popular, and the price of hardware storage has become increasingly low, which makes the flow of multimedia information grow exponentially. In the face of a large number of video information streams, how to efficiently and automatically analyze, recognize and understand massive video information with minimal human intervention, and thus describe it semantically, has become a hot topic in the current image processing and computer vision research fields. topic. For most people, it may be a simple matter to watch a short video and then describe the video in words. However, for a machine, it is a cha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62G06N3/02
CPCG06N3/02G06V20/46G06F18/214
Inventor 李楚怡袁东芝余卫宇胡丹
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products