Three-dimensional convolution and Faster RCNN-based video action detection method

A three-dimensional convolution and motion detection technology, applied in the field of image processing, can solve the problems of synchronous positioning and lack of spatial annotation information, and achieve the effect of motion positioning and excellent performance.

Inactive Publication Date: 2018-08-14
BEIJING UNIV OF TECH
View PDF4 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, S-CNN lacks the ability to predict at fine temporal resolution and locate the precise temporal boundaries of action instances
At the same time, due to the lack of spatial annotation informa

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-dimensional convolution and Faster RCNN-based video action detection method
  • Three-dimensional convolution and Faster RCNN-based video action detection method
  • Three-dimensional convolution and Faster RCNN-based video action detection method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0092] Example 1:

[0093] In the present invention, NVIDIA GPU is used as the computing platform, CUDA is used as the GPU accelerator, and Caffe is selected as the CNN framework.

[0094] S1 data preparation:

[0095] The ActivityNet 1.3 data set is used in this experiment. The ActivityNet data set consists only of untrimmed videos and has 200 different types of activities, including 10024 videos in the training set, 4926 videos in the validation set, and 5044 videos in the test set. Compared with THUMOS14, this is a large data set, regardless of the number of activity categories involved or the number of videos.

[0096] Step 1.1: Download the ActivityNet 1.3 data set from http: / / activity-net.org / download.html to the local.

[0097] Step 1.2: Convert the downloaded video into images according to 25 frames per second (fps), and the images of different subsets are placed in folders according to the corresponding video names.

[0098] Step 1.3: According to the data augmentation strateg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a three-dimensional convolution and Faster RCNN-based video action detection method. The method comprises the steps of firstly introducing a new model, and encoding a video stream by using a three-dimensional full convolutional network; secondly generating candidate time regions comprising actions based on generated features, and generating a group of candidate frames; andfinally performing classified detection on the candidate frames subjected to different film editing, thereby predicting action types and video action starting and ending time in the video stream, andpredicting spatial position boundary frames of the actions. Compared with an existing method, the method provided by the invention has excellent performance in unpruned data set video time sequence action detection, and can realize action localization in the absence of spatial labeling information.

Description

technical field [0001] The invention belongs to the technical field of image processing, and relates to a video action detection method based on three-dimensional convolution and Faster RCNN. Background technique [0002] With the vigorous development of Internet video media, video content detection and analysis has attracted extensive attention from industry and academia in recent years. Action recognition is an important branch of video content detection and analysis. In the field of computer vision, action recognition has made great progress in both manual features and deep learning features. Action recognition usually boils down to a classification problem, where each action instance in the training phase is pruned from a longer video sequence, and the learned action model is used for either pruned videos (e.g., HMDB51 and UCF101) or untrimmed videos ( For example, action recognition in THUMOS14 and ActivityNet). However, most videos in the real world are unrestricted...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/32G06K9/62G06N3/04
CPCG06V40/20G06V20/40G06V10/25G06N3/045G06F18/24
Inventor 刘波聂相琴
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products