Video feature learning method based on video and text pair discriminant analysis

A video feature and discriminant analysis technology, applied in neural learning methods, biometric recognition, character and pattern recognition, etc., can solve the problems of high labor cost, lack of practicability and scalability, and inability to extract video information. The effect of reducing labor costs

Pending Publication Date: 2020-06-05
NANJING UNIV
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The problem to be solved by the present invention is: most of the current learning methods of video features use manually marked action category labels as supervision information to train three-dimensional convolutional neural networks to perform action recognition tasks. There ar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video feature learning method based on video and text pair discriminant analysis
  • Video feature learning method based on video and text pair discriminant analysis
  • Video feature learning method based on video and text pair discriminant analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention proposes a video feature learning method based on video and text description pair discrimination, which forms a video-text pair from the video and the text description matched with the video, uses a three-dimensional convolutional network to extract video features, and uses a DistilBERT network to extract text description features Through training, the video and its corresponding text description have similar semantic features, so that the text description automatically becomes the label of the corresponding video, and the training builds a deep learning network for learning video features.

[0028] Specifically include the following steps:

[0029] 1) Preparatory stage: Construct two historical feature sequences with a size of N×256 to store the features of videos and text descriptions in the database respectively, where N represents the number of videos in the database, and the dimension of features is 256 dimensions. The historical feature sequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a video feature learning method based on video and text description pair discriminant. A video-text pair is formed by the video and the text description matched with the video;a three-dimensional convolutional network is adopted to extract video features, a DistilBERT network is adopted to extract text description features, the video and the corresponding text descriptionhave similar semantic features through training, the text description automatically becomes a label of the corresponding video, and a deep learning network is trained and constructed and used for learning the video features. According to the method for performing video feature learning by using text description information as auxiliary information, efficient video feature representation can be learned while the labor cost is effectively reduced, and all data in a data set can be more effectively utilized to obtain video representation with higher discrimination capability by using a video feature learning method by using the video and the text description for discriminant.

Description

technical field [0001] The invention belongs to the technical field of computer software, relates to video representation technology, in particular to a video feature learning method based on video and text description pair discrimination. Background technique [0002] With the explosive growth of video data on the Internet, the demand for intelligent video analysis continues to rise, and the basis and key of video analysis is to obtain video features that can effectively describe the information contained in the video. Construct a variety of video analysis specific applications on the platform. Learning efficient video features using deep learning techniques has become a commonly adopted method, and these methods can be roughly divided into three categories. [0003] The first application is to use a 3D convolutional neural network trained on a human-annotated action recognition dataset to learn video features. In recent years, many manually labeled large-scale action rec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06V40/10G06V40/20G06V20/40G06N3/045G06F18/241
Inventor 王利民李天昊武港山
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products