Unlock instant, AI-driven research and patent intelligence for your innovation.

Video feature processing method and device and three-dimensional convolutional neural network model

A video feature, three-dimensional convolution technology, applied in the field of electronic information, to achieve the effect of improving accuracy

Active Publication Date: 2019-09-24
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] This application provides a video feature processing method and device, the purpose of which is to solve the problem of how to fully extract the information in the video without consuming too much resources, so as to improve the accuracy of video classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video feature processing method and device and three-dimensional convolutional neural network model
  • Video feature processing method and device and three-dimensional convolutional neural network model
  • Video feature processing method and device and three-dimensional convolutional neural network model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The inventors found that when the three-dimensional convolutional neural network model extracts video features, the effect of the time-domain features extracted by it is uneven for different videos. After further research, it was found that the effect is directly related to the length of the video. For some moderately long videos, the effect is better, but for some short or long videos, the temporal feature effect is relatively poor. The reason is that for a convolutional layer with the same convolution kernel expansion coefficient, when the video duration is short, it is difficult to capture a certain frame of image that has changed, thus missing key information, resulting in poor temporal feature effects , and when the video duration was long at that time, if a certain action lasted for several seconds, it was difficult to capture the information when the action changed in a timely and effective manner.

[0046] In order to solve the above problems, the embodiment of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video feature processing method and device, and a three-dimensional convolutional neural network model. The method comprises the steps of obtaining a video feature vector obtained after three-dimensional convolution processing; carrying out convolution processing on the video feature vector in a space domain to obtain a space domain processing result; dividing the spatial domain processing result into a plurality of groups of spatial domain processing sub-results, respectively carrying out convolution processing on the at least two groups of spatial domain processing sub-results in a time domain to obtain at least two groups of time domain processing sub-results, the convolution kernel expansion coefficients of the convolution processing of each group of spatial domain processing sub-results in the time domain being different from each other; and splicing the at least two groups of time domain processing sub-results to obtain a processed video feature vector. According to the method disclosed by the invention, each group of airspace processing sub-results are subjected to convolution by using different convolution kernel expansion coefficients, i.e., time domain convolution processing is carried out on the video features by using multiple scales, so that the information of the image at the moment of change can be captured more timely and comprehensively, and the accuracy of the time domain features can be improved.

Description

technical field [0001] The present application relates to the field of electronic information, in particular to a video feature processing method, device and three-dimensional convolutional neural network model. Background technique [0002] Video feature extraction is a basic link in video processing. In almost all video analysis and processing processes, video features need to be extracted first. The three-dimensional convolutional neural network model is widely used because it can better capture the characteristic information of time domain and space domain at the same time. The process of using the three-dimensional convolutional neural network model for video feature extraction is actually to perform convolution operations on the time and space dimensions at the same time, so as to obtain the visual features of each frame of the video in the video. The correlation of adjacent image frames over time. [0003] However, when the current 3D convolutional neural network mo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06N3/04G06N3/08G06T3/40G06T5/30
CPCG06T3/4038G06T5/30G06N3/08G06T2207/10016G06T2200/32G06V20/46G06N3/045
Inventor 张云桃晋瑞锦
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD