Multi-modal joint representation learning method and system based on variational distillation

A learning method and multi-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve the problem of lack of unified modal distillation method, and achieve the problem of forgetting, simple and effective, The effect of reducing information loss

Pending Publication Date: 2022-08-02
SUZHOU UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For this reason, the technical problem to be solved by the present invention is to overcome the problems existing in the prior art, and propose a multimodal joint representation learning method and system based on variational distillation, which solves the problem that the prior art lacks a unified modal distillation method , which outperforms existing baseline models on different modality datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal joint representation learning method and system based on variational distillation
  • Multi-modal joint representation learning method and system based on variational distillation
  • Multi-modal joint representation learning method and system based on variational distillation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] see figure 1 and 2 As shown, this embodiment provides a multimodal joint representation learning method based on variational distillation, including the following steps:

[0037]S1: Deploy a student model and a teacher model, the teacher model includes a text teacher model and an image teacher model, the student model includes a multimodal data unification module, and input raw multimodal data, wherein the original multimodal data includes Original text modal data and original image modal data, input the original text modal data and original image modal data into the multi-modal data unified module, obtain the text modal input and image modal input with the same input form, and compare the text modal data and image modal data. The modal input and the image modal input are normalized;

[0038] S2: the student model includes a modal joint representation module, and the normalized text modal input and image modal input are respectively input to the modal joint representa...

Embodiment 2

[0089] The following describes a multimodal joint representation learning system based on variational distillation disclosed in the second embodiment of the present invention. The multimodal joint representation learning system based on variational distillation described below is the same as the one based on The multimodal joint representation learning method of variational distillation can refer to each other correspondingly.

[0090] The second embodiment of the present invention discloses a multimodal joint representation learning system based on variational distillation, including:

[0091] A student model, the student model includes a multimodal data unified module and a modal joint representation module, and inputs original multimodal data, wherein the original multimodal data includes original text modal data and original image modal data, and the The original text modal data and the original image modal data are input to the multi-modal data unification module, the tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a variational distillation-based multi-modal joint representation learning method, which comprises the steps of deploying a student model, a text teacher model and an image teacher model, arranging multi-modal data including original text modal data and image modal data to obtain text modal input and image modal input with the same input; inputting the original text modal data and the original image modal data into a modal joint representation module to obtain text output and image output, and inputting the original text modal data and the original image modal data into a text teacher model and an image teacher model to obtain text output and image output; and representing the correlation between the text output and the image output corresponding to the student model and the teacher model by using variation mutual information, and carrying out combined distillation training on the text output and the image output by using a distillation loss function, so that the student model obtains the ability of matching the teacher model. The invention provides a variational distillation-based multi-modal joint representation learning method and system. The variational distillation-based multi-modal joint representation learning method and system surpass the existing reference model on different modal data sets.

Description

technical field [0001] The invention relates to the technical field of multimodal distillation, in particular to a multimodal joint representation learning method and system based on variational distillation. Background technique [0002] Large-scale pre-trained models, such as BERT, GPT, and RoBERTa in text modalities, or ResNet, BiT, ViT, etc. in image modalities, have brought revolutionary advances in different modalities. However, as pretrained models get larger, it becomes increasingly challenging to deploy them in resource-poor environments. Therefore, these model compression methods that reduce the size of pre-trained models and preserve most of the performance are also gaining attention. [0003] In the field of text modality, PKD is an earlier exploration, which is very simple and effective, mainly compressing the BERT model in the fine-tuning stage. Subsequently, DistillBERT, TinyBERT, and MobileBERT used KL divergence or L2 loss function to perform task-independ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04G06K9/62G06V10/764G06V10/774G06V10/778G06V10/82G06F16/35
CPCG06N3/082G06F16/353G06N3/045G06F18/217G06F18/24G06F18/214
Inventor 张亚伟王晶晶李寿山
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products