Semi-paired multi-modal data hash coding method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A hash coding and multi-modal technology, applied in the field of cross-modal retrieval, can solve problems such as limited retrieval accuracy of hash coding and limitation of nonlinear fitting ability

Active Publication Date: 2020-02-04

NORTHWESTERN POLYTECHNICAL UNIV

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, both of these two methods use shallow models, which can be regarded as two-layer neural networks with only an input layer and an output layer, which have limitations in nonlinear fitting capabilities. Therefore, for large-scale and complex-structured The multimodal data, the hash code generated by the shallow model has limited retrieval accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0078] Multimodal hash coding is to represent multiple pairs of real number vectors with the same group of binary number vectors, so as to achieve cross-modal retrieval. For example, images and their text tags captured from social networks are paired. Through multi-modal hash coding, it is possible to retrieve images with text tags, or retrieve text tags with images. Half pairing means that only part of the pairing information of the multimodal data is known, while full pairing means that all the data in the multimodal data are in one-to-one correspondence. For example, there is usually a one-to-one correspondence between pictures and accompanying texts in WeChat Moments, and such data is fully paired multimodal data. For another example, for pictures and texts obtained directly from web pages, sometimes due to typesetting reasons, the pictures and the text describing the content of the picture are not next to each other, so the obtained data cannot pre-judge which words descr...

Embodiment 2

[0173] The purpose of this embodiment is to provide a computer system.

[0174] A computer system, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, it realizes:

[0175] Obtain the image information matrix and text information matrix of semi-paired multimodal data;

[0176] Constructing a first neural network that maps images to text space and a second neural network that maps text to image space and selects encoding layers in the first neural network and the second neural network respectively;

[0177] establishing an objective function using the encoding layer;

[0178] The neural network is trained according to the objective function to obtain a hash coding matrix of the semi-paired multimodal data.

Embodiment 3

[0180] The purpose of this embodiment is to provide a computer-readable storage medium.

[0181] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the following steps are performed:

[0182] Obtain the image information matrix and text information matrix of semi-paired multimodal data;

[0183] Constructing a first neural network that maps images to text space and a second neural network that maps text to image space and selects encoding layers in the first neural network and the second neural network respectively;

[0184] establishing an objective function using the encoding layer;

[0185] The neural network is trained according to the objective function to obtain a hash coding matrix of the semi-paired multimodal data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a semi-paired multi-modal data hash coding method. The method comprises the steps of obtaining an image information matrix and a text information matrix of semi-paired multi-modal data; constructing a first neural network for mapping the image to a text space and a second neural network for mapping the text to the image space, and respectively selecting a coding layer fromthe first neural network and the second neural network; establishing a target function by using the coding layer; and training the first neural network and the second neural network according to the target function to obtain a hash coding matrix of the semi-paired multi-modal data. According to the method, the deep neural network is adopted, compared with an existing shallow model method, the better nonlinear fitting capacity is achieved, and the generated hash codes have higher precision and diversity.

Description

technical field [0001] The invention belongs to the technical field of cross-modal retrieval, and in particular relates to a hash coding method for semi-paired multi-modal data. Background technique [0002] Hash coding is a method of representing a vector of real numbers as a vector of binary numbers, which can reduce the amount of computation by replacing the retrieval of vectors of real numbers with the retrieval of binary number vectors. Multimodal data refers to different types of real vectors. For example, the SIFT (Scale-invariant feature transform) feature used to represent images is a 128-dimensional real vector, which is used to represent text. LDA (Latent Dirichlet Allocation, document topic generation model) feature is a 10-dimensional real number vector, and these two sets of real number vectors are data of two different modalities. [0003] Multimodal hash coding is to represent multiple pairs of real number vectors with the same group of binary number vectors...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/51G06F16/583G06N3/04G06N3/08

CPCG06F16/51G06F16/5846G06N3/08G06N3/045

Inventor 田大湧周德云魏仪文侍佼雷雨

Owner NORTHWESTERN POLYTECHNICAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Semi-paired multi-modal data hash coding method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology