A short video classification method and device based on multimodal joint learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A classification method and short video technology, applied in neural learning methods, video data clustering/classification, video data retrieval, etc., to achieve the effect of solving multi-label classification problems, ensuring objectivity, and improving classification accuracy

Active Publication Date: 2022-05-17

泉州津大智能研究院有限公司

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to address the deficiencies of the prior art, to propose a short video classification method and device based on multimodal joint learning, to make full use of the modal information and label information of the short video, and effectively solve the short video multi-label classification problem, and improved classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0061] Such as figure 1 As shown, the short video classification method based on multimodal joint learning includes the following steps:

[0062] A. Extracting visual modality features z from a complete short video v , sound mode characteristics z a and the text modal feature z t ; specifically include:

[0063] First do ResNet (residual network) on the key frames of the short video, and then do an average pooling operation on all frames to obtain the visual modality z v :

[0064] Extraction of sound modality features z using long short-term memory networks a :

[0065] Extracting text modality features z using multi-layer perceptron t :

[0066] Among them, X={X v ,X a ,X t} represents the short video, where X v 、X a and x t Indicates the original visual information, original audio information and original text information of the short video; β v ,β a ,β t Respectively represent the network parameters used to extract the visual modal features, audio mo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a short video classification method and device based on multi-modal joint learning, comprising the following steps: A, extracting visual modal features z for short videos v , sound mode characteristics z a and the text modal feature z t ; B. Learn latent representation features potential representation features and latent representation features respectively C, construct reconstruction loss function D, obtain label feature matrix P composed of label vectors; E, use Multi‑head Attention to obtain the final representation of short videos ; F, perform multi-label classification on the final representation, obtain the classification loss function H, and construct the objective function by the reconstruction loss function and the classification loss function. The present invention makes full use of the modal information and label information of the short video to effectively solve the short video multi-label classification problem , and improve the classification accuracy.

Description

technical field [0001] The invention relates to a short video classification method and device based on multimodal joint learning. Background technique [0002] In recent years, with the rapid development of digital media technology, the popularity of smart terminals and the popularity of social networks, more and more information is presented in multimedia content. High-definition cameras, large-capacity storage and high-speed network connections have created extremely convenient for users shooting and sharing conditions, thus creating massive multimedia data. [0003] As a new type of user-generated content, short videos have been greatly welcomed in social networks due to their unique advantages such as low barriers to creation, fragmented content, and strong social attributes. Especially since 2011, with the popularization of mobile Internet terminals, the speed-up of the network and the reduction of traffic charges, short videos have quickly won the support and favor o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/75G06F16/78G06F16/783G06K9/62G06N3/04G06N3/08G06Q10/06

CPCG06F16/75G06F16/7867G06F16/7847G06F16/7834G06F16/7844G06N3/08G06Q10/06393G06N3/045G06F18/253

Inventor 苏育挺

Owner 泉州津大智能研究院有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A short video classification method and device based on multimodal joint learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology