Multimodal Retrieval Method Based on Online Deep Topic Model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A topic model and multi-modal technology, applied in the field of image processing, can solve problems such as the inability to accurately describe the deep connection of different modal features, the difficulty of visualizing the relationship between hidden layers and observations, and the inability to mine modal connections, etc., to achieve multi-modal Dynamic retrieval, improve retrieval accuracy, and associate descriptions with exact effects

Active Publication Date: 2022-05-03

XIDIAN UNIV

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] This multimodal retrieval method based on deep neural network introduces a learning method based on contrastive divergence and multi-prediction training, repeatedly encodes the structure to the deep network to adjust the entire network, and then shares the highest layer of these specific modal networks, through joint learning The method trains the entire neural network and uses the shared hidden layer (the highest layer) as a joint feature representation; although this multimodal retrieval method based on deep neural network can mine the connection between different modal features, it is based on The shortcomings of the multimodal retrieval method of deep neural network are: due to the "black box" characteristics of deep neural network, the hidden layer units of multi-layer restricted Boltzmann machine are limited to binary values, and the expressive ability is limited, thus The multimodal technology based on deep neural network cannot accurately describe the deep connection between different modal features; at the same time, there is a nonlinear mapping between the hidden layer of the restricted Boltzmann machine and the observed data, and it is difficult to connect the hidden layer with the observed data. Visualize the relationship between

[0007] Although this topic model-based article feature extraction retrieval method can directly establish a probability model for multi-modal input, and transform the joint feature representation problem into the hidden layer distribution inference problem of the Bayesian model; however, this topic model-based The shortcomings of the article feature extraction and retrieval method are: limited by the traditional topic model, which is a shallow model, which is limited to constructing shallow connections between different modalities, and cannot dig deeper connections between modalities, thus affecting retrieval performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0022] refer to figure 1 , is a flow chart of a multimodal retrieval method based on an online deep topic model of the present invention; wherein the multimodal retrieval method based on an online deep topic model includes the following steps:

[0023] Step 1, obtain the MIR Flicker 25k data, the MIR Flicker 25k data includes the J images downloaded from the social photography website Flickr and the corresponding complete manual annotation labels, the jth image includes N j words, where j represents the jth image of J images, and N j A word is the complete manual tagged label corresponding to the jth image; all the words included in each image form a corresponding text, and then J images and J corresponding texts are obtained, and the J images and J The corresponding text is recorded as a dataset; the next step is to preprocess said dataset.

[0024] First, J corresponding texts are preprocessed, and the first step is to obtain J corresponding text vocabularies:

[0025] 1a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-modal retrieval method based on an online deep topic model, which belongs to the technical field of image processing. Several words included in the image; after preprocessing the data set, the text matrix and the image feature matrix are obtained; the Poisson gamma belief network including the T layer is established, and the variable weight matrix of each layer of the text matrix is obtained; according to the text matrix The variable weight matrix of each layer, and based on the online deep topic model, the optimal global topic parameter matrix of the image feature matrix and the optimal global topic parameter matrix of the text matrix are obtained; according to the optimal global topic parameter matrix and text matrix of the image feature matrix The optimal global topic parameter matrix and the image feature matrix are obtained, and the predicted word matrix of the text matrix is obtained as a multimodal retrieval result based on the online deep topic model of the present invention.

Description

technical field [0001] The invention belongs to the technical field of image processing, and in particular relates to a multi-modal retrieval method based on an online deep topic model, which is suitable for quickly mining the deep connection between two different modes of image-text, extracting joint features, and using the extracted joint feature for text-image retrieval. Background technique [0002] Multi-modal retrieval technology uses joint learning of different modal features, and excavates the connection between different modal features to obtain joint features containing multi-modal information, so that different modal data can be mutually generated; online deep topic model ONLINE-PGBN (Poisson Gamma Belief Network) is an online deep topic model based on the Bayesian framework. The ONLINE-PGBN model has a multi-layer network structure, which can quickly extract multi-layer features of data, and is excellent in text processing. Based on the traditional topic model; ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/58G06N3/08

CPCG06N3/084

Inventor 陈渤肖肃诚王超杰

Owner XIDIAN UNIV

Multimodal Retrieval Method Based on Online Deep Topic Model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology