Cross-modal deep hash retrieval method based on self-supervision

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A modal and hashing technology, applied in the field of cross-modal deep hash retrieval, can solve the problems of insufficient training, disappearance of neural network gradients, and insufficient use of label data, and achieve the effect of excellent retrieval performance.

Active Publication Date: 2019-10-08

HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

View PDF4 Cites 36 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, these methods ignore the semantic value of the most important tag data, only use tag data to generate a similarity matrix, do not make full use of tag data to describe finer-grained tag information, and more accurately describe the category data to which they belong

At the same time, because there is a data imbalance problem in the existing data sets, that is, the amount of similar data in different modalities is much smaller than the amount of dissimilar data, and the existing methods do not deal with the data imbalance problem, which easily leads to insufficient training Even the case of overfitting

At the same time, because hash codes need to be generated to represent data of various modalities, most of the existing methods add a sigmoid function to the last layer of the neural network to compress its output between 0 and 1, and then generate discrete Binary code, but the sigmoid function itself can easily lead to the gradient disappearance of the neural network during backpropagation, and after directly compressing the output value of the neural network to between 0 and 1, it will cause damage to both image and text data. certain loss of information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0017] The present invention proposes a cross-modal deep Hiha retrieval method based on self-supervision. Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0018] This cross-modal deep hash retrieval method constructs an independent category label processing network in a self-supervised manner to learn the semantic features of the label data, model the semantic features of data between different modalities, and supervise the image and text networks at the same time. The extracted features make their semantic feature distribution tend to be consistent, so that the obtained hash code can better retain semantic information. At the same time, in view of the data imbalance problem in the training data set, a loss function of adaptive weight is proposed, and the weight is automatically adjusted according to the proportion of relevant and irrelevant samples in the samples input to the network each time, so...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a cross-modal joint hash retrieval method based on self-supervision. The method comprises the following steps: step 1, processing image modal data: carrying out feature extraction on the image modal data by adopting a deep convolutional neural network, carrying out Hash learning on the image data, and setting the number of nodes of the last full connection layer of the deep convolutional neural network as the length of a Hash code; step 2, processing the text modal data; using a word bag model for modeling text data, a two-layer full-connection neural network is established for feature extraction of text modal data, wherein the input of the neural network is a word vector represented by the word bag model, and the length of data of a first full-connection layer node is the same as that of data of a second full-connection layer node and a Hash code; step 3, for the neural network of category label processing, extracting semantic features from the label data by adopting a self-supervised training mode; and step 4, minimizing the distance between the features extracted from the image and the text network and the semantic features of the label network, so thatthe Hash model of the image and the text network can more fully learn the semantic features among different modals.

Description

technical field [0001] The invention belongs to the technical field of cross-modal deep hash retrieval, in particular to a cross-modal deep hash retrieval method based on self-supervision. Background technique [0002] Artificial intelligence technology has experienced many outbreaks and cold winters since its birth, and this outbreak of artificial intelligence technology is even more menacing, because compared with the previous outbreaks, it has a distinctive feature-big data as the core. Base. Big data is not only because of its large amount of data, but more importantly, its data types are diverse, and the value density of data is low. We generate and receive various information every day, and these information will be recorded, and then analyze our daily behavior and living habits through various artificial intelligence technologies, so as to provide various convenient services for our lives. Among the massive multimedia data, some data are not independent of each othe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/51G06F16/583G06F16/31G06F16/33G06K9/62

CPCG06F16/51G06F16/583G06F16/325G06F16/3344G06F18/23

Inventor 王轩漆舒汉李逸凡蒋琳廖清刘洋夏文李化乐吴宇琳贾丰玮

Owner HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal deep hash retrieval method based on self-supervision

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology