Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method

A classifier, cross-modal technology, applied in still image data retrieval, metadata still image retrieval, audio data retrieval and other directions, can solve the loss of useful or important detailed information, low retrieval efficiency, and can not well meet user retrieval. needs, etc.

Inactive Publication Date: 2019-07-05
XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In order to solve the technical problems that the existing image retrieval methods have low retrieval efficiency, may lose some useful or important detailed information, and...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method
  • Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method
  • Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0111] refer to figure 1 , the steps of the realization of the cross-modal retrieval method of image and voice of the present invention are as follows:

[0112] Step 1, build an image-voice database, each image in the database corresponds to a voice;

[0113] Step 2, the image-voice database that step 1) builds is divided into image-voice training set and image-voice test set;

[0114] Step 3, constructing the image-speech neural network. The image-speech neural network includes an image deep neural subnetwork and a speech convolutional neural subnetwork;

[0115] The network structure of the image deep neural subnetwork is shown in Table 1.

[0116] The speech convolutional neural subnetwork is a one-dimensional structure, including a convolutional layer (Conv) and a pooling layer (Pool), where the size of the convolution kernel does not exceed 10, because the use of a larger convolution kernel will greatly increase the complexity The pooling layer adopts the maximum pooli...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

In order to solve the technical problems that an existing image retrieval method is low in retrieval efficiency, some useful or important detail information may be lost, and the real retrieval requirement of a user cannot be well met, the invention provides an image and voice cross-modal retrieval classifier model, a retrieval system and a retrieval method. According to the invention, an image-voice neural network structure is constructed. The deep neural network is trained by using the correlation between the image and the voice description of the image as supervision information to obtain the function model of the image and voice correlation relationship, so that the cross-mode retrieval of the image and the voice is realized, the image retrieval efficiency and the retrieval accuracy areimproved, and the human-computer interaction in the retrieval process becomes easier.

Description

technical field [0001] The invention belongs to the technical field of information processing, and relates to a cross-modal retrieval classifier model, a retrieval system and a retrieval method, which can be used in the fields of pattern recognition, data mining, computer vision and the like. Background technique [0002] In recent years, with the massive increase of image data, it has become a thorny problem to quickly retrieve the desired image from massive images. [0003] Existing image retrieval mainly includes two types of retrieval methods, image search by text and image search by image. The text-to-image approach is highly dependent on the speed of manually entering tags and the availability of tags. However, keyboard input is often less efficient. The image-by-image search method requires example images as the input of the query, but example images usually do not exist in practical applications. The defects of the above two retrieval methods make them unable to b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/68G06F16/58G06K9/62G06N3/04
CPCG06N3/044G06N3/045G06F18/2411G06F18/253G06F18/214
Inventor 袁媛卢孝强郭毛
Owner XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products