Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

A cross-modal and audio technology, applied in the field of cross-modal image and audio retrieval based on deep heterogeneous correlation learning, can solve a large amount of storage space, insufficient utilization of heterogeneous correlation relations, inability to effectively select cross-modal paired samples, etc. problem, to achieve the effect of improving retrieval accuracy and reducing quantization error

Pending Publication Date: 2021-09-03
WUHAN UNIV OF TECH
7 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

Although the existing cross-modal remote sensing image-sound retrieval methods have made some progress, however, the existing cross-modal image-audio retrieval methods still have some limitations: (1) Existing methods do not fully learn heterogeneous correlation relations , leading to underutilization of heterogeneous correlations in cross-modal lear...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention discloses a cross-modal image audio retrieval method based on deep heterogeneous correlation learning. The method mainly solves the problem that heterogeneous correlation information of images and audios is not sufficiently utilized in an existing method. According to the invention, firstly, a new cross-modal pairwise construction strategy is designed to select effective image and audio pairs, which is beneficial to capturing heterogeneous correlation between images and audios. According to the invention, the relationship between the image and the audio is established by utilizing the heterogeneous correlation of the depth features, the Hash code is generated by bridging the deep feature correlation between the image and the audio for image and audio retrieval, and the quantization error between the Hash-like code and the Hash code is reduced by using the regularization constraint. According to the invention, the heterogeneous correlation of the depth features is fully utilized, and the retrieval performance is further improved.

Application Domain

Technology Topic

Image

  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning
  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning
  • Cross-modal image audio retrieval method based on deep heterogeneous correlation learning

Examples

  • Experimental program(1)

Example Embodiment

[0047] Example 1
[0048] The environment used in this embodiment is GeForce GTX Titan X GPU, Inter Core i7-5930K, 3.50GHZ CPU, 64G RAM, linux operating system, using Python and open source library KERAS for development.
[0049] The first step is to divide the training data set and test data set:
[0050] Using the Mirflickr 25K image and audio data set, make 50,000 pairs of positive and negative sample image and audio pairs, and select 40,000 pairs as the training data set I train , and the remaining 10,000 pairs are used as the test data set I test;
[0051] In the second step, pairs of pairs of samples are selected using the cross-modal pairing structure:
[0052] First construct N pairs of binary sample sets and the corresponding set of two-tuple labels Sample set of binary groups Consists of positive sample pairs and negative sample pairs, I i Indicates the i-th picture, V i Indicates the i-th audio, label y i ∈{0,1}, a label of 1 indicates that the image and audio are semantically similar, and a label of 0 indicates that the image and audio are semantically dissimilar.
[0053] The third step is to calculate the feature representation and hash code of the image and audio:
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Feeder terminal device and signal acquisition calculation and line selection method thereof

PendingCN112557804AImprove acquisition accuracy and speedReduce quantization errorFault location by conductor typesCircuit interrupters testingDigital conversionFeeder line
Owner:SHANDONG ELECTRICAL ENG & EQUIP GRP XINNENG TECH CO LTD

Method and device for quantizing local features of picture into visual vocabularies

ActiveCN103020231AReduce quantization errorImprove robustnessSpecial data processing applicationsRelationship - FatherConfidence factor
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Digital adaptive hysteresis system

InactiveUS20050286380A1Reduce quantization errorOutput errorTelevision system detailsDigital technique networkSelf adaptiveRounding
Owner:CIRRUS LOGIC INC

ITQ algorithm-based Indonetic similar news recommendation method

ActiveCN109992716AReduce computation and memory overheadReduce quantization errorWeb data indexingSpecial data processing applicationsTheoretical computer scienceData science
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Classification and recommendation of technical efficacy words

  • Improve retrieval accuracy
  • Reduce quantization error

Semantic relationship network-based cross-mode information retrieval method

InactiveCN101894170AReduce mistakesImprove retrieval accuracySpecial data processing applicationsMultimedia searchRelationship analysis
Owner:WUHAN UNIV

Vertical engine searching method and system for domain body restraint

InactiveCN101901247AReduce retrieval timeImprove retrieval accuracySpecial data processing applicationsA domainForm classification
Owner:BEIJING NORMAL UNIVERSITY

Semantic enhanced hash medical image retrieval method based on mixed attention

PendingCN113889228AReduce quantization errorHigh precisionStill image data queryingMedical imagesMachine learningSemantic enhancement
Owner:WUHAN UNIV OF TECH

Dictionary learning static image lossy compression method based on minimum quantization error criterion

ActiveCN107170020AReduce coding costsReduce quantization errorCode conversionImage codingDictionary learningSparse coefficient
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Low-power time-to-digital converter

ActiveCN110174834AHigh precisionReduce quantization errorTime-to-digital convertersVIT signalsCapacitance
Owner:FUDAN UNIV

Neural network optimization method and related equipment

PendingCN111950700AReduce quantization errorEfficient in training and useGeometric image transformationCharacter and pattern recognitionEngineeringAlgorithm
Owner:HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products