Video feature learning method based on video and text pair discriminant analysis

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video feature and discriminant analysis technology, applied in neural learning methods, biometric recognition, character and pattern recognition, etc., can solve the problems of high labor cost, lack of practicability and scalability, and inability to extract video information. The effect of reducing labor costs

Pending Publication Date: 2020-06-05

NANJING UNIV

View PDF2 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The problem to be solved by the present invention is: most of the current learning methods of video features use manually marked action category labels as supervision information to train three-dimensional convolutional neural networks to perform action recognition tasks. There are also some methods that use the structure of the video itself to design a proxy task to train the network to learn video features. The video features learned by this type of method are poor and cannot effectively identify the video features contained in the video. information to extract

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] The present invention proposes a video feature learning method based on video and text description pair discrimination, which forms a video-text pair from the video and the text description matched with the video, uses a three-dimensional convolutional network to extract video features, and uses a DistilBERT network to extract text description features Through training, the video and its corresponding text description have similar semantic features, so that the text description automatically becomes the label of the corresponding video, and the training builds a deep learning network for learning video features.

[0028] Specifically include the following steps:

[0029] 1) Preparatory stage: Construct two historical feature sequences with a size of N×256 to store the features of videos and text descriptions in the database respectively, where N represents the number of videos in the database, and the dimension of features is 256 dimensions. The historical feature sequen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a video feature learning method based on video and text description pair discriminant. A video-text pair is formed by the video and the text description matched with the video;a three-dimensional convolutional network is adopted to extract video features, a DistilBERT network is adopted to extract text description features, the video and the corresponding text descriptionhave similar semantic features through training, the text description automatically becomes a label of the corresponding video, and a deep learning network is trained and constructed and used for learning the video features. According to the method for performing video feature learning by using text description information as auxiliary information, efficient video feature representation can be learned while the labor cost is effectively reduced, and all data in a data set can be more effectively utilized to obtain video representation with higher discrimination capability by using a video feature learning method by using the video and the text description for discriminant.

Description

technical field [0001] The invention belongs to the technical field of computer software, relates to video representation technology, in particular to a video feature learning method based on video and text description pair discrimination. Background technique [0002] With the explosive growth of video data on the Internet, the demand for intelligent video analysis continues to rise, and the basis and key of video analysis is to obtain video features that can effectively describe the information contained in the video. Construct a variety of video analysis specific applications on the platform. Learning efficient video features using deep learning techniques has become a commonly adopted method, and these methods can be roughly divided into three categories. [0003] The first application is to use a 3D convolutional neural network trained on a human-annotated action recognition dataset to learn video features. In recent years, many manually labeled large-scale action rec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08

CPCG06N3/084G06V40/10G06V40/20G06V20/40G06N3/045G06F18/241

Inventor 王利民李天昊武港山

Owner NANJING UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video feature learning method based on video and text pair discriminant analysis

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology