Multi-modal joint representation learning method and system based on variational distillation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A learning method and multi-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve the problem of lack of unified modal distillation method, and achieve the problem of forgetting, simple and effective, The effect of reducing information loss

Pending Publication Date: 2022-08-02

SUZHOU UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] For this reason, the technical problem to be solved by the present invention is to overcome the problems existing in the prior art, and propose a multimodal joint representation learning method and system based on variational distillation, which solves the problem that the prior art lacks a unified modal distillation method , which outperforms existing baseline models on different modality datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] see figure 1 and 2 As shown, this embodiment provides a multimodal joint representation learning method based on variational distillation, including the following steps:

[0037]S1: Deploy a student model and a teacher model, the teacher model includes a text teacher model and an image teacher model, the student model includes a multimodal data unification module, and input raw multimodal data, wherein the original multimodal data includes Original text modal data and original image modal data, input the original text modal data and original image modal data into the multi-modal data unified module, obtain the text modal input and image modal input with the same input form, and compare the text modal data and image modal data. The modal input and the image modal input are normalized;

[0038] S2: the student model includes a modal joint representation module, and the normalized text modal input and image modal input are respectively input to the modal joint representa...

Embodiment 2

[0089] The following describes a multimodal joint representation learning system based on variational distillation disclosed in the second embodiment of the present invention. The multimodal joint representation learning system based on variational distillation described below is the same as the one based on The multimodal joint representation learning method of variational distillation can refer to each other correspondingly.

[0090] The second embodiment of the present invention discloses a multimodal joint representation learning system based on variational distillation, including:

[0091] A student model, the student model includes a multimodal data unified module and a modal joint representation module, and inputs original multimodal data, wherein the original multimodal data includes original text modal data and original image modal data, and the The original text modal data and the original image modal data are input to the multi-modal data unification module, the tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a variational distillation-based multi-modal joint representation learning method, which comprises the steps of deploying a student model, a text teacher model and an image teacher model, arranging multi-modal data including original text modal data and image modal data to obtain text modal input and image modal input with the same input; inputting the original text modal data and the original image modal data into a modal joint representation module to obtain text output and image output, and inputting the original text modal data and the original image modal data into a text teacher model and an image teacher model to obtain text output and image output; and representing the correlation between the text output and the image output corresponding to the student model and the teacher model by using variation mutual information, and carrying out combined distillation training on the text output and the image output by using a distillation loss function, so that the student model obtains the ability of matching the teacher model. The invention provides a variational distillation-based multi-modal joint representation learning method and system. The variational distillation-based multi-modal joint representation learning method and system surpass the existing reference model on different modal data sets.

Description

technical field [0001] The invention relates to the technical field of multimodal distillation, in particular to a multimodal joint representation learning method and system based on variational distillation. Background technique [0002] Large-scale pre-trained models, such as BERT, GPT, and RoBERTa in text modalities, or ResNet, BiT, ViT, etc. in image modalities, have brought revolutionary advances in different modalities. However, as pretrained models get larger, it becomes increasingly challenging to deploy them in resource-poor environments. Therefore, these model compression methods that reduce the size of pre-trained models and preserve most of the performance are also gaining attention. [0003] In the field of text modality, PKD is an earlier exploration, which is very simple and effective, mainly compressing the BERT model in the fine-tuning stage. Subsequently, DistillBERT, TinyBERT, and MobileBERT used KL divergence or L2 loss function to perform task-independ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/08G06N3/04G06K9/62G06V10/764G06V10/774G06V10/778G06V10/82G06F16/35

CPCG06N3/082G06F16/353G06N3/045G06F18/217G06F18/24G06F18/214

Inventor 张亚伟王晶晶李寿山

Owner SUZHOU UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-modal joint representation learning method and system based on variational distillation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology