Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Short video classification method based on multi-modal feature complete representation

A classification method and short video technology, applied in video data clustering/classification, video data retrieval, video data indexing, etc., to achieve the effect of improving accuracy

Inactive Publication Date: 2021-07-23
TIANJIN UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a short video classification method based on the complete representation of multimodal features, which solves the problem of multi-label short video classification and evaluates the results. See the following description for details:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short video classification method based on multi-modal feature complete representation
  • Short video classification method based on multi-modal feature complete representation
  • Short video classification method based on multi-modal feature complete representation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] The embodiment of the present invention provides a short video classification method based on the complete representation of multimodal features, which makes full use of the content information and label information of the short video, see figure 1 , the method includes the following steps:

[0028] 101: For content information, according to experience, the semantic feature representation of visual modality is very important in short video multi-label classification tasks. Therefore, a representation learning based on visual modality features is proposed, mainly based on visual modality features, from The modality-missing perspective constructs four subspaces, learns information complementarity between modalities, and obtains latent representations of two types of visual modality features. Considering the consistency of visual modality feature information, in order to obtain a more compact representation of visual modality features, the latent representations of two typ...

Embodiment 2

[0039] The scheme in embodiment 1 is further introduced below in conjunction with calculation formula and examples, see the following description for details:

[0040] 201: The model inputs a complete short video, and extracts three modal features of vision, audio and track respectively;

[0041] For the visual modality, extract key frames, and use the classic image feature extraction network ResNet (residual network) for all video key frames, and then do the average (AvePooling) operation to obtain the visual modality feature X v The overall characteristics of z v :

[0042]

[0043] Among them, ResNet(·): residual network, AvePooling(·): average operation, X v : the original visual features of the short video, βv: the network parameters to be learned, visual modality z v The dimension is d v .

[0044] For the audio mode, draw the sound spectrogram, and use "CNN+LSTM (convolutional neural network + long short-term memory network)" to extract the sound features of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short video classification method based on multi-modal feature complete representation, and the method comprises the steps: for the content information of a short video, providing constructing four subspaces from the perspective of modal missing mainly based on a visual modal feature, and obtaining potential feature representations respectively, further fusing the potential feature representations of the four subspaces by using an automatic coding and decoding network to ensure that more robust and effective public potential representations are learned; for label information, using inverse covariance estimation and a graph attention network to explore correlation between labels and update label representation to obtain label vector representation corresponding to the short video; providing a multi-head cross-modal fusion scheme based on multi-head attention for the public potential representation and the label vector representation, wherein the multi-head cross-modal fusion scheme is used for obtaining a label prediction score of the short video, wherein the overall loss function of the model is composed of traditional multi-label classification loss and reconstruction loss of an automatic coding and decoding network and is used for measuring the difference between a network output value and an actual value and guiding the network to find an optimal solution of the model.

Description

technical field [0001] The invention relates to the field of short video classification, in particular to a short video classification method based on complete representation of multimodal features. Background technique [0002] In recent years, with the popularity of smart terminals and the popularity of social networks, more and more information is presented in multimedia content. High-definition cameras, large-capacity storage and high-speed network connections have created extremely convenient shooting and sharing conditions for users, thus creating massive amounts of multimedia data. [0003] As a new type of user-generated content, short videos have been greatly welcomed in social networks due to their unique advantages such as low barriers to creation, fragmented content, and strong social attributes. Especially since 2011, with the popularization of mobile Internet terminals, the speed-up of the network and the reduction of traffic charges, short videos have quickly...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/46G06K9/62G06F16/71G06F16/75G06F16/78G06F16/783
CPCG06F16/71G06F16/75G06F16/7847G06F16/7867G06V20/41G06V20/46G06V10/462G06F18/2431G06F18/253
Inventor 井佩光张丽娟苏育挺
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products