Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for detecting video action based on convolutional neural network

一种卷积神经网络、动作检测的技术,应用在基于卷积神经网络的视频动作检测领域,能够解决破坏动作连续性、3DCNN学习不到运动特征、增加类内差异性等问题

Inactive Publication Date: 2017-06-27
PEKING UNIV SHENZHEN GRADUATE SCHOOL
View PDF4 Cites 68 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] (1) Sparse sampling will destroy the continuity within the action, so that the 3D CNN cannot learn better motion features;
[0012] (2) The video clips obtained at different sampling frequencies share a network for training, which will increase the intra-class differences, increase the network learning burden, and require more complex networks and more training data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting video action based on convolutional neural network
  • Method for detecting video action based on convolutional neural network
  • Method for detecting video action based on convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0042] The invention provides a video action detection method based on a convolutional neural network. By adding a spatiotemporal pyramid pooling layer to the traditional network structure, it eliminates the limitation of the network on input, speeds up training and testing, and better excavates the action in the video. Motion information enables improved performance in both video action classification and temporal localization. The present invention does not require that the input video segments have the same size.

[0043] Such as figure 2 As shown, since the traditional convolutional neural network requires the same size of the input video clips, the video clips need to be down-sampled before being input into the network. However, the present invention removes the downsampling process and inser...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention publishes a method for detecting video action, and relates to the technical field of computer vision recognition. The video action detection method is based on a convolutional neural network, a space-time pyramid pooling layer is added to a network structure, restrictions of a network on input is eliminated, the speed of training and detection is improved, and the performance of video action classification and time positioning is improved; the convolutional neural network includes convolutional layers, common pooling layers, space-time pyramid pooling layers and full connection layers; and output of the convolutional neural network includes a category classification output layer and a time positioning calculation result output layer. According to the method provided by the invention, video clips of different time lengths do not need to be obtained through downsampling, but by direct one-time input of a whole video, thereby improving efficiency; and at the same time, since the network trains video clips of the same frequency, difference within a category is not increased, the learning burden of the network is reduced, model convergence is relatively fast, and a detection effect is relatively good.

Description

technical field [0001] The invention relates to computer vision recognition technology, in particular to a video action detection method based on a convolutional neural network. Background technique [0002] In recent years, almost everyone has a mobile phone. Every mobile phone has a camera. Coupled with the development of the Internet and the advancement of communication technology, people are more and more fond of shooting videos and spreading them on the Internet, so the number of videos is growing explosively, and video storage and analysis technology is very important. [0003] Video motion detection refers to classifying the motion in the video and giving the start time and end time of the motion, for example figure 1 shown. In recent years, the task of video action recognition has made great progress, but it is mainly suitable for cropped videos, that is, there is only one action in the video, and there are no redundant frames. As a result, scholars began to stud...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06V10/764
CPCG06N3/04G06V20/42G06F18/24G06F18/214G06N3/084G06V20/41G06V10/82G06V10/764G06N3/045G06F17/15G06F17/18G06N3/08G06N3/047
Inventor 王文敏李志豪王荣刚李革董胜富王振宇李英赵辉高文
Owner PEKING UNIV SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products