Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

A cross-modal and audio technology, applied in the field of cross-modal image and audio retrieval based on deep heterogeneous correlation learning, can solve a large amount of storage space, insufficient utilization of heterogeneous correlation relations, inability to effectively select cross-modal paired samples, etc. problem, to achieve the effect of improving retrieval accuracy and reducing quantization error

Pending Publication Date: 2021-09-03
WUHAN UNIV OF TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the existing cross-modal remote sensing image-sound retrieval methods have made some progress, however, the existing cross-modal image-audio retrieval methods still have some limitations: (1) Existing methods do not fully learn heterogeneous correlation relations , leading to underutilization of heterogeneous correlations in cross-modal lear

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning
  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning
  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0047] Example 1

[0048] The environment used in this embodiment is GeForce GTX Titan X GPU, Inter Core i7-5930K, 3.50GHZ CPU, 64G RAM, linux operating system, using Python and open source library KERAS for development.

[0049] The first step is to divide the training data set and test data set:

[0050] Using the Mirflickr 25K image and audio data set, make 50,000 pairs of positive and negative sample image and audio pairs, and select 40,000 pairs as the training data set I train , and the remaining 10,000 pairs are used as the test data set I test ;

[0051] In the second step, pairs of pairs of samples are selected using the cross-modal pairing structure:

[0052] First construct N pairs of binary sample sets and the corresponding set of two-tuple labels Sample set of binary groups Consists of positive sample pairs and negative sample pairs, I i Indicates the i-th picture, V i Indicates the i-th audio, label y i ∈{0,1}, a label of 1 indicates that the image an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-modal image audio retrieval method based on deep heterogeneous correlation learning. The method mainly solves the problem that heterogeneous correlation information of images and audios is not sufficiently utilized in an existing method. According to the invention, firstly, a new cross-modal pairwise construction strategy is designed to select effective image and audio pairs, which is beneficial to capturing heterogeneous correlation between images and audios. According to the invention, the relationship between the image and the audio is established by utilizing the heterogeneous correlation of the depth features, the Hash code is generated by bridging the deep feature correlation between the image and the audio for image and audio retrieval, and the quantization error between the Hash-like code and the Hash code is reduced by using the regularization constraint. According to the invention, the heterogeneous correlation of the depth features is fully utilized, and the retrieval performance is further improved.

Description

technical field [0001] The invention belongs to the field of image retrieval, and in particular relates to a cross-modal image and audio retrieval method based on deep heterogeneous correlation learning. Background technique [0002] With the explosive growth of various images, texts, audio and video data on the Internet, cross-modal image and audio retrieval has been widely used in the fields of computer vision and natural language processing, such as search engines and driverless vehicles. Typical application scenarios. The task of cross-modal image-audio retrieval is to retrieve relevant images with audio, or retrieve relevant audio with images. However, due to the heterogeneity of multimodal data, it is difficult for users to obtain favorable information quickly and accurately. Therefore, how to improve retrieval efficiency and solve the heterogeneous problem of multimodal data are two great challenges for cross-modal retrieval tasks. [0003] At present, some deep lea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/583G06F16/683G06K9/62G06N3/04G06N3/08
CPCG06F16/583G06F16/683G06N3/08G06N3/048G06F18/214
Inventor 陈亚雄汤一博熊盛武荣毅路雄博
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products