Image-text cross-modal retrieval based on multi-layer semantic deep Hash algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A hash algorithm and cross-modal technology, applied in the field of cross-modal retrieval, can solve the problems of affecting retrieval results, multiple labels in real data, and inability to fully preserve data associations, etc., to achieve the effect of improving retrieval accuracy

Inactive Publication Date: 2019-08-09

BEIJING JIAOTONG UNIV

View PDF7 Cites 31 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, with the continuous growth of multimedia data, the feature representation using deep learning faces the challenges of storage space and retrieval efficiency due to the large dimensionality, which makes it unable to adapt to large-scale multimedia data retrieval tasks.

At the same time, the problem of cross-modal retrieval also faces the problem of multiple labels in real data

Most of the existing solutions use the problem of converting the problem into a single-label learning problem of binary correlation, resulting in the learned model not being able to fully preserve the relationship between the data in the original semantic space and affecting the final retrieval results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with specific example:

[0019] In the present invention, both image and text modes are taken as examples for discussion.

[0020] The present invention provides an image-text cross-modal retrieval (DeepMulti-Level Semantic Hashing for Cross-modal Retrieval, DMSH) method based on a multi-layer semantic depth hashing algorithm, which includes three modules: deep feature extraction module, similarity Degree matrix generation module, hash code learning module, such as figure 1 shown;

[0021] Table 1 Image feature extraction network structure

[0022]

[0023]

[0024] The deep feature extraction module uses a deep neural network to extract image and text data features. The deep convolutional neural network CNN-F network structure is used for image feature extraction, and the network structure configuration is shown in Table 1. In the text feature extraction stage, the text data is first modeled...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an image combined with deep learning and hashing methods. And a text cross-modal retrieval model. The invention provides a deep cross-modal Hash algorithm based on multilayersemantics in order to solve the limitation that a traditional cross-modal Hash method based on deep learning directly converts multi-label data into a single label problem when processing the multi-label data problem. And the similarity between the data is defined through a co-occurrence relation between the multi-label data, and the similarity is used as supervision information of network training. And designing a loss function which comprehensively considers multi-layer semantic similarity and binary similarity, and training the network, so that the feature extraction and Hash code learningprocesses are unified in one framework, and end-to-end learning is realized. According to the algorithm, semantic correlation information between the data is fully utilized, and the retrieval accuracyis improved.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval, in particular to an image-text cross-modal retrieval algorithm based on multi-layer semantics combined with deep learning and a hash method. Background technique [0002] With the development of mobile Internet and the popularization of smart phones, digital cameras and other devices, the multimedia data on the Internet is growing explosively. In the field of information retrieval, the continuous growth of multimedia big data has brought about the demand for cross-modal retrieval applications. However, the current mainstream search engines, such as Baidu, Google, Bing, etc., only provide one mode of retrieval results. In addition, as deep learning has made a series of breakthroughs in the fields of computer vision and natural language processing, the combination of multimedia big data and artificial intelligence is a common development trend in the future of the two fields. Therefore, combi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583G06F16/58

CPCG06F16/5846G06F16/5866G06F16/583

Inventor 冀振燕姚伟娜杨文韬皮怀雨

Owner BEIJING JIAOTONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image-text cross-modal retrieval based on multi-layer semantic deep Hash algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology