Short video classification method based on multi-modal feature complete representation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A classification method and short video technology, applied in video data clustering/classification, video data retrieval, video data indexing, etc., to achieve the effect of improving accuracy

Inactive Publication Date: 2021-07-23

TIANJIN UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The present invention provides a short video classification method based on the complete representation of multimodal features, which solves the problem of multi-label short video classification and evaluates the results. See the following description for details:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0027] The embodiment of the present invention provides a short video classification method based on the complete representation of multimodal features, which makes full use of the content information and label information of the short video, see figure 1 , the method includes the following steps:

[0028] 101: For content information, according to experience, the semantic feature representation of visual modality is very important in short video multi-label classification tasks. Therefore, a representation learning based on visual modality features is proposed, mainly based on visual modality features, from The modality-missing perspective constructs four subspaces, learns information complementarity between modalities, and obtains latent representations of two types of visual modality features. Considering the consistency of visual modality feature information, in order to obtain a more compact representation of visual modality features, the latent representations of two typ...

Embodiment 2

[0039] The scheme in embodiment 1 is further introduced below in conjunction with calculation formula and examples, see the following description for details:

[0040] 201: The model inputs a complete short video, and extracts three modal features of vision, audio and track respectively;

[0041] For the visual modality, extract key frames, and use the classic image feature extraction network ResNet (residual network) for all video key frames, and then do the average (AvePooling) operation to obtain the visual modality feature X v The overall characteristics of z v :

[0042]

[0043] Among them, ResNet(·): residual network, AvePooling(·): average operation, X v : the original visual features of the short video, βv: the network parameters to be learned, visual modality z v The dimension is d v .

[0044] For the audio mode, draw the sound spectrogram, and use "CNN+LSTM (convolutional neural network + long short-term memory network)" to extract the sound features of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a short video classification method based on multi-modal feature complete representation, and the method comprises the steps: for the content information of a short video, providing constructing four subspaces from the perspective of modal missing mainly based on a visual modal feature, and obtaining potential feature representations respectively, further fusing the potential feature representations of the four subspaces by using an automatic coding and decoding network to ensure that more robust and effective public potential representations are learned; for label information, using inverse covariance estimation and a graph attention network to explore correlation between labels and update label representation to obtain label vector representation corresponding to the short video; providing a multi-head cross-modal fusion scheme based on multi-head attention for the public potential representation and the label vector representation, wherein the multi-head cross-modal fusion scheme is used for obtaining a label prediction score of the short video, wherein the overall loss function of the model is composed of traditional multi-label classification loss and reconstruction loss of an automatic coding and decoding network and is used for measuring the difference between a network output value and an actual value and guiding the network to find an optimal solution of the model.

Description

technical field [0001] The invention relates to the field of short video classification, in particular to a short video classification method based on complete representation of multimodal features. Background technique [0002] In recent years, with the popularity of smart terminals and the popularity of social networks, more and more information is presented in multimedia content. High-definition cameras, large-capacity storage and high-speed network connections have created extremely convenient shooting and sharing conditions for users, thus creating massive amounts of multimedia data. [0003] As a new type of user-generated content, short videos have been greatly welcomed in social networks due to their unique advantages such as low barriers to creation, fragmented content, and strong social attributes. Especially since 2011, with the popularization of mobile Internet terminals, the speed-up of the network and the reduction of traffic charges, short videos have quickly...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/46G06K9/62G06F16/71G06F16/75G06F16/78G06F16/783

CPCG06F16/71G06F16/75G06F16/7847G06F16/7867G06V20/41G06V20/46G06V10/462G06F18/2431G06F18/253

Inventor 井佩光张丽娟苏育挺

Owner TIANJIN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Short video classification method based on multi-modal feature complete representation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology