Supercharge Your Innovation With Domain-Expert AI Agents!

A short video classification method and device based on multimodal joint learning

A classification method and short video technology, applied in neural learning methods, video data clustering/classification, video data retrieval, etc., to achieve the effect of solving multi-label classification problems, ensuring objectivity, and improving classification accuracy

Active Publication Date: 2022-05-17
泉州津大智能研究院有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to address the deficiencies of the prior art, to propose a short video classification method and device based on multimodal joint learning, to make full use of the modal information and label information of the short video, and effectively solve the short video multi-label classification problem, and improved classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A short video classification method and device based on multimodal joint learning
  • A short video classification method and device based on multimodal joint learning
  • A short video classification method and device based on multimodal joint learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] Such as figure 1 As shown, the short video classification method based on multimodal joint learning includes the following steps:

[0062] A. Extracting visual modality features z from a complete short video v , sound mode characteristics z a and the text modal feature z t ; specifically include:

[0063] First do ResNet (residual network) on the key frames of the short video, and then do an average pooling operation on all frames to obtain the visual modality z v :

[0064] Extraction of sound modality features z using long short-term memory networks a :

[0065] Extracting text modality features z using multi-layer perceptron t :

[0066] Among them, X={X v ,X a ,X t} represents the short video, where X v 、X a and x t Indicates the original visual information, original audio information and original text information of the short video; β v ,β a ,β t Respectively represent the network parameters used to extract the visual modal features, audio mo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a short video classification method and device based on multi-modal joint learning, comprising the following steps: A, extracting visual modal features z for short videos v , sound mode characteristics z a and the text modal feature z t ; B. Learn latent representation features potential representation features and latent representation features respectively C, construct reconstruction loss function D, obtain label feature matrix P composed of label vectors; E, use Multi‑head Attention to obtain the final representation of short videos ; F, perform multi-label classification on the final representation, obtain the classification loss function H, and construct the objective function by the reconstruction loss function and the classification loss function. The present invention makes full use of the modal information and label information of the short video to effectively solve the short video multi-label classification problem , and improve the classification accuracy.

Description

technical field [0001] The invention relates to a short video classification method and device based on multimodal joint learning. Background technique [0002] In recent years, with the rapid development of digital media technology, the popularity of smart terminals and the popularity of social networks, more and more information is presented in multimedia content. High-definition cameras, large-capacity storage and high-speed network connections have created extremely convenient for users shooting and sharing conditions, thus creating massive multimedia data. [0003] As a new type of user-generated content, short videos have been greatly welcomed in social networks due to their unique advantages such as low barriers to creation, fragmented content, and strong social attributes. Especially since 2011, with the popularization of mobile Internet terminals, the speed-up of the network and the reduction of traffic charges, short videos have quickly won the support and favor o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/75G06F16/78G06F16/783G06K9/62G06N3/04G06N3/08G06Q10/06
CPCG06F16/75G06F16/7867G06F16/7847G06F16/7834G06F16/7844G06N3/08G06Q10/06393G06N3/045G06F18/253
Inventor 苏育挺
Owner 泉州津大智能研究院有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More