Universal cross-modal retrieval model based on deep hash

A cross-modal and model technology, applied in network data retrieval, other database retrieval, biological neural network models, etc., can solve the problems of model retrieval delay and inefficiency, large storage space, and ignoring retrieval efficiency.

Pending Publication Date: 2021-07-06
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Most existing techniques directly model based on the extracted feature values ​​to achieve cross-modal retrieval, which is very time-consuming for large-scale datasets and requires a lot of storage s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Universal cross-modal retrieval model based on deep hash
  • Universal cross-modal retrieval model based on deep hash
  • Universal cross-modal retrieval model based on deep hash

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Such as figure 1 As shown, a general cross-modal retrieval model based on deep hashing, the model includes image model 1, text model 2, binary code conversion model 3 and Hamming space 4, in which:

[0031] Image model 1, extracting image features, abstracting original image features and semantics;

[0032] Text model 2, which converts text data into vector form, and extracts the features and semantics of the text;

[0033] Binary code conversion model 3, which converts the features and semantics extracted by the image and text models into binary codes, and then maps the data points in the original feature space of different modalities to the common Hamming space;

[0034] Hamming space 4, the common subspace of the image and text modal feature space, in which the Hamming distance between the query data and the original data binary hash code is calculated for similarity ranking.

[0035] The image model 1 mainly recommends the use of CNN models such as ResNet, DenseNe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a universal cross-modal retrieval model based on deep hash. The universal cross-modal retrieval model comprises an image model, a text model, a binary code conversion model and a Hamming space. The image model is used for the feature and semantic extraction of the image data; the text model is used for the feature and semantic extraction of the text data; the binary code conversion model is used for converting the original features into the binary codes; the Hamming space is a common subspace of images and the text data, and the similarity of the cross-modal data can be directly calculated in the Hamming space. According to the universal model for solving cross-modal retrieval by combining deep learning and Hash learning, the data points in an original feature space are mapped into the binary codes in the public Hamming space, similarity ranking is carried out by calculating the Hamming distance between the codes of the data to be queried and the codes of the original data, and therefore a retrieval result is obtained, and the retrieval efficiency is greatly improved. The binary codes are used for replacing the original data storage, so that the requirement of the retrieval tasks for the storage capacity is greatly reduced.

Description

technical field [0001] The invention relates to the field of cross-modal retrieval, especially the cross-modal retrieval of images and texts. Background technique [0002] In recent years, with the vigorous development of the Internet and the popularity of smart devices and social networks, multimedia data has exploded on the Internet. These massive data include various modal forms such as text, image, video, and audio, and the same thing will be described by many different modal data. These data are "heterogeneous and multi-source" in form, but interrelated semantically. People's demand for information acquisition is no longer satisfied with single-modal data retrieval, and the realization of cross-modal retrieval through knowledge collaboration of different modes has become a research hotspot in recent years. [0003] Deep learning has made breakthroughs in single-modal fields, such as natural language processing, imagery, and speech recognition. The powerful abstraction...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/953G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F16/953G06F40/30G06N3/049G06N3/08G06N3/045G06F18/213G06F18/214
Inventor 段友祥陈宁孙歧峰
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products