Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cross-modal and audio technology, applied in the field of cross-modal image and audio retrieval based on deep heterogeneous correlation learning, can solve a large amount of storage space, insufficient utilization of heterogeneous correlation relations, inability to effectively select cross-modal paired samples, etc. problem, to achieve the effect of improving retrieval accuracy and reducing quantization error

Pending Publication Date: 2021-09-03

WUHAN UNIV OF TECH

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the existing cross-modal remote sensing image-sound retrieval methods have made some progress, however, the existing cross-modal image-audio retrieval methods still have some limitations: (1) Existing methods do not fully learn heterogeneous correlation relations , leading to underutilization of heterogeneous correlations in cross-modal learning

(2) Existing image and audio retrieval methods use high-dimensional real-valued features for cross-modal retrieval, which require a large amount of storage space

(3) Some existing cross-modal retrieval methods cannot effectively select good cross-modal paired samples, which will eventually affect the effectiveness of cross-modal correlation learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] The environment used in this embodiment is GeForce GTX Titan X GPU, Inter Core i7-5930K, 3.50GHZ CPU, 64G RAM, linux operating system, using Python and open source library KERAS for development.

[0049] The first step is to divide the training data set and test data set:

[0050] Using the Mirflickr 25K image and audio data set, make 50,000 pairs of positive and negative sample image and audio pairs, and select 40,000 pairs as the training data set I train , and the remaining 10,000 pairs are used as the test data set I test ;

[0051] In the second step, pairs of pairs of samples are selected using the cross-modal pairing structure:

[0052] First construct N pairs of binary sample sets and the corresponding set of two-tuple labels Sample set of binary groups Consists of positive sample pairs and negative sample pairs, I i Indicates the i-th picture, V i Indicates the i-th audio, label y i ∈{0,1}, a label of 1 indicates that the image and audio are semantic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal image audio retrieval method based on deep heterogeneous correlation learning. The method mainly solves the problem that heterogeneous correlation information of images and audios is not sufficiently utilized in an existing method. According to the invention, firstly, a new cross-modal pairwise construction strategy is designed to select effective image and audio pairs, which is beneficial to capturing heterogeneous correlation between images and audios. According to the invention, the relationship between the image and the audio is established by utilizing the heterogeneous correlation of the depth features, the Hash code is generated by bridging the deep feature correlation between the image and the audio for image and audio retrieval, and the quantization error between the Hash-like code and the Hash code is reduced by using the regularization constraint. According to the invention, the heterogeneous correlation of the depth features is fully utilized, and the retrieval performance is further improved.

Description

technical field [0001] The invention belongs to the field of image retrieval, and in particular relates to a cross-modal image and audio retrieval method based on deep heterogeneous correlation learning. Background technique [0002] With the explosive growth of various images, texts, audio and video data on the Internet, cross-modal image and audio retrieval has been widely used in the fields of computer vision and natural language processing, such as search engines and driverless vehicles. Typical application scenarios. The task of cross-modal image-audio retrieval is to retrieve relevant images with audio, or retrieve relevant audio with images. However, due to the heterogeneity of multimodal data, it is difficult for users to obtain favorable information quickly and accurately. Therefore, how to improve retrieval efficiency and solve the heterogeneous problem of multimodal data are two great challenges for cross-modal retrieval tasks. [0003] At present, some deep lea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583G06F16/683G06K9/62G06N3/04G06N3/08

CPCG06F16/583G06F16/683G06N3/08G06N3/048G06F18/214

Inventor 陈亚雄汤一博熊盛武荣毅路雄博

Owner WUHAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology