Check patentability & draft patents in minutes with Patsnap Eureka AI!

Semi-paired multi-modal data hash coding method

A hash coding and multi-modal technology, applied in the field of cross-modal retrieval, can solve problems such as limited retrieval accuracy of hash coding and limitation of nonlinear fitting ability

Active Publication Date: 2020-02-04
NORTHWESTERN POLYTECHNICAL UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, both of these two methods use shallow models, which can be regarded as two-layer neural networks with only an input layer and an output layer, which have limitations in nonlinear fitting capabilities. Therefore, for large-scale and complex-structured The multimodal data, the hash code generated by the shallow model has limited retrieval accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-paired multi-modal data hash coding method
  • Semi-paired multi-modal data hash coding method
  • Semi-paired multi-modal data hash coding method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] Multimodal hash coding is to represent multiple pairs of real number vectors with the same group of binary number vectors, so as to achieve cross-modal retrieval. For example, images and their text tags captured from social networks are paired. Through multi-modal hash coding, it is possible to retrieve images with text tags, or retrieve text tags with images. Half pairing means that only part of the pairing information of the multimodal data is known, while full pairing means that all the data in the multimodal data are in one-to-one correspondence. For example, there is usually a one-to-one correspondence between pictures and accompanying texts in WeChat Moments, and such data is fully paired multimodal data. For another example, for pictures and texts obtained directly from web pages, sometimes due to typesetting reasons, the pictures and the text describing the content of the picture are not next to each other, so the obtained data cannot pre-judge which words descr...

Embodiment 2

[0173] The purpose of this embodiment is to provide a computer system.

[0174] A computer system, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, it realizes:

[0175] Obtain the image information matrix and text information matrix of semi-paired multimodal data;

[0176] Constructing a first neural network that maps images to text space and a second neural network that maps text to image space and selects encoding layers in the first neural network and the second neural network respectively;

[0177] establishing an objective function using the encoding layer;

[0178] The neural network is trained according to the objective function to obtain a hash coding matrix of the semi-paired multimodal data.

Embodiment 3

[0180] The purpose of this embodiment is to provide a computer-readable storage medium.

[0181] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the following steps are performed:

[0182] Obtain the image information matrix and text information matrix of semi-paired multimodal data;

[0183] Constructing a first neural network that maps images to text space and a second neural network that maps text to image space and selects encoding layers in the first neural network and the second neural network respectively;

[0184] establishing an objective function using the encoding layer;

[0185] The neural network is trained according to the objective function to obtain a hash coding matrix of the semi-paired multimodal data.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semi-paired multi-modal data hash coding method. The method comprises the steps of obtaining an image information matrix and a text information matrix of semi-paired multi-modal data; constructing a first neural network for mapping the image to a text space and a second neural network for mapping the text to the image space, and respectively selecting a coding layer fromthe first neural network and the second neural network; establishing a target function by using the coding layer; and training the first neural network and the second neural network according to the target function to obtain a hash coding matrix of the semi-paired multi-modal data. According to the method, the deep neural network is adopted, compared with an existing shallow model method, the better nonlinear fitting capacity is achieved, and the generated hash codes have higher precision and diversity.

Description

technical field [0001] The invention belongs to the technical field of cross-modal retrieval, and in particular relates to a hash coding method for semi-paired multi-modal data. Background technique [0002] Hash coding is a method of representing a vector of real numbers as a vector of binary numbers, which can reduce the amount of computation by replacing the retrieval of vectors of real numbers with the retrieval of binary number vectors. Multimodal data refers to different types of real vectors. For example, the SIFT (Scale-invariant feature transform) feature used to represent images is a 128-dimensional real vector, which is used to represent text. LDA (Latent Dirichlet Allocation, document topic generation model) feature is a 10-dimensional real number vector, and these two sets of real number vectors are data of two different modalities. [0003] Multimodal hash coding is to represent multiple pairs of real number vectors with the same group of binary number vectors...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/51G06F16/583G06N3/04G06N3/08
CPCG06F16/51G06F16/5846G06N3/08G06N3/045
Inventor 田大湧周德云魏仪文侍佼雷雨
Owner NORTHWESTERN POLYTECHNICAL UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More