Text Image Multimodal Retrieval Method Based on Deep Topic Model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A topic model and text image technology, which is applied in still image data retrieval, unstructured text data retrieval, still image data query, etc., can solve the problem of limited expression ability, affecting retrieval performance, and inability to accurately describe the deep connection of different modal features and other problems, to achieve the effect of accurate association description, good retrieval performance, and improved retrieval accuracy

Active Publication Date: 2019-10-11

XIDIAN UNIV

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although this method can mine the connection between different modal features, there are still shortcomings in this method: due to the "black box" characteristics of the deep neural network, the hidden layer unit of the multi-layer restricted Boltzmann machine Limited to binary values and limited expression ability, the multimodal technology based on deep neural network cannot accurately describe the deep connection between different modal features. Linear mapping, it is difficult to visualize the relationship between the hidden layer and the observation

Although this method can directly establish a probabilistic model for multi-modal input, and transform the joint feature representation problem into the hidden layer distribution inference problem of the Bayesian model, the disadvantages of this method are: limited by the traditional Topic models are all shallow models, which are limited to building shallow connections between different modalities, and cannot dig deeper connections between modalities, thus affecting retrieval performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0036] Refer to attached figure 1 The steps of the present invention are further described in detail.

[0037] Step 1. Preprocess the training data and test data.

[0038] Randomly select 25,000 labeled data from the MIR Flicker dataset in the form of text-image pairs, and use 15,000 of them as training data and 10,000 of them as test data.

[0039] Count the number of repeated words in the text data of the training data and the test data, sort them in descending order, and take the first 2000 words as the vocabulary. For each text, count the number of words that appear in the vocabulary and store them in a vector, and the value on each dimension of the vector represents the number of times the word appears in the document.

[0040] Extract the features of each image to form an image feature matrix with the feature dimension as the number of rows and the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep theme model-based text image multi-mode retrieval method which can be used for text and image multi-mode retrieval. The method comprises the following steps of: (1) preprocessing training data and test data; (2) initializing a hyper-parameter and a share parameter of a deep theme model; (3) training the deep theme model; (4) training a classifier by using a joint feature; and (5) carrying out test by using the test data. According to the method, deep-to-shallow relation between different mode hidden layers is mined by utilizing the deep theme model, and a joint feature which comprises multi-mode information is obtained to carry out retrieval.

Description

technical field [0001] The invention belongs to the technical field of image processing, and further relates to a text image multimodal retrieval method based on a deep topic model in the technical field of artificial intelligence. The invention can be used to mine the deep connection between two different modalities of image and text, extract joint features, and use the extracted joint features to retrieve the text-image. Background technique [0002] The multi-modal retrieval technology uses joint learning of different modal features, and excavates the connection between different modal features to obtain joint features containing multi-modal information, so that different modal data can be mutually generated. Deep Topic Model PGBN (Poisson Gamma Belief Network) is a deep topic model based on Bayesian framework. The PGBN model has a multi-layer network structure, which can extract multi-layer features of data, and is superior to traditional topic models in text processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/53G06F16/2458G06F16/35G06K9/62

Inventor 陈渤周翼王超杰丛玉来

Owner XIDIAN UNIV

Text Image Multimodal Retrieval Method Based on Deep Topic Model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology