Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Three-dimensional convolution and Faster RCNN-based video action detection method

A three-dimensional convolution and motion detection technology, applied in the field of image processing, can solve the problems of synchronous positioning and lack of spatial annotation information, and achieve the effect of motion positioning and excellent performance.

Inactive Publication Date: 2018-08-14
BEIJING UNIV OF TECH
View PDF4 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, S-CNN lacks the ability to predict at fine temporal resolution and locate the precise temporal boundaries of action instances
At the same time, due to the lack of spatial annotation information in the current untrimmed dataset, it is difficult for the current untrimmed dataset to simultaneously locate the spatial bounding box of the action when locating the time boundary of the action.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-dimensional convolution and Faster RCNN-based video action detection method
  • Three-dimensional convolution and Faster RCNN-based video action detection method
  • Three-dimensional convolution and Faster RCNN-based video action detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] In the present invention, NVIDIA GPU is used as the computing platform, CUDA is used as the GPU accelerator, and Caffe is selected as the CNN framework.

[0094] S1 data preparation:

[0095] The ActivityNet 1.3 dataset is used in this experiment. The ActivityNet dataset consists only of untrimmed videos with 200 different types of activities, including 10024 videos in the training set, 4926 videos in the validation set and 5044 videos in the test set. Compared to THUMOS14, this is a large dataset, both in terms of the number of activity categories involved and the number of videos.

[0096] Step 1.1: Download the ActivityNet 1.3 dataset from http: / / activity-net.org / download.html to the local.

[0097] Step 1.2: Convert the downloaded video into images according to 25 frames per second (fps), and the images of different subsets are placed in folders according to the corresponding video names.

[0098] Step 1.3: According to the data augmentation strategy, this experi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a three-dimensional convolution and Faster RCNN-based video action detection method. The method comprises the steps of firstly introducing a new model, and encoding a video stream by using a three-dimensional full convolutional network; secondly generating candidate time regions comprising actions based on generated features, and generating a group of candidate frames; andfinally performing classified detection on the candidate frames subjected to different film editing, thereby predicting action types and video action starting and ending time in the video stream, andpredicting spatial position boundary frames of the actions. Compared with an existing method, the method provided by the invention has excellent performance in unpruned data set video time sequence action detection, and can realize action localization in the absence of spatial labeling information.

Description

technical field [0001] The invention belongs to the technical field of image processing, and relates to a video action detection method based on three-dimensional convolution and Faster RCNN. Background technique [0002] With the vigorous development of Internet video media, video content detection and analysis has attracted extensive attention from industry and academia in recent years. Action recognition is an important branch of video content detection and analysis. In the field of computer vision, action recognition has made great progress in both manual features and deep learning features. Action recognition usually boils down to a classification problem, where each action instance in the training phase is pruned from a longer video sequence, and the learned action model is used for either pruned videos (e.g., HMDB51 and UCF101) or untrimmed videos ( For example, action recognition in THUMOS14 and ActivityNet). However, most videos in the real world are unrestricted...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/32G06K9/62G06N3/04
CPCG06V40/20G06V20/40G06V10/25G06N3/045G06F18/24
Inventor 刘波聂相琴
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products