Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A classifier, cross-modal technology, applied in still image data retrieval, metadata still image retrieval, audio data retrieval and other directions, can solve the loss of useful or important detailed information, low retrieval efficiency, and can not well meet user retrieval. needs, etc.

Inactive Publication Date: 2019-07-05

XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

View PDF3 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] In order to solve the technical problems that the existing image retrieval methods have low retrieval efficiency, may lose some useful or important detailed information, and cannot well meet the real retrieval needs of users, the present invention provides a cross-modal retrieval classifier for images and voices Model, Retrieval System and Retrieval Method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0111] refer to figure 1 , the steps of the realization of the cross-modal retrieval method of image and voice of the present invention are as follows:

[0112] Step 1, build an image-voice database, each image in the database corresponds to a voice;

[0113] Step 2, the image-voice database that step 1) builds is divided into image-voice training set and image-voice test set;

[0114] Step 3, constructing the image-speech neural network. The image-speech neural network includes an image deep neural subnetwork and a speech convolutional neural subnetwork;

[0115] The network structure of the image deep neural subnetwork is shown in Table 1.

[0116] The speech convolutional neural subnetwork is a one-dimensional structure, including a convolutional layer (Conv) and a pooling layer (Pool), where the size of the convolution kernel does not exceed 10, because the use of a larger convolution kernel will greatly increase the complexity The pooling layer adopts the maximum pooli...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

In order to solve the technical problems that an existing image retrieval method is low in retrieval efficiency, some useful or important detail information may be lost, and the real retrieval requirement of a user cannot be well met, the invention provides an image and voice cross-modal retrieval classifier model, a retrieval system and a retrieval method. According to the invention, an image-voice neural network structure is constructed. The deep neural network is trained by using the correlation between the image and the voice description of the image as supervision information to obtain the function model of the image and voice correlation relationship, so that the cross-mode retrieval of the image and the voice is realized, the image retrieval efficiency and the retrieval accuracy areimproved, and the human-computer interaction in the retrieval process becomes easier.

Description

technical field [0001] The invention belongs to the technical field of information processing, and relates to a cross-modal retrieval classifier model, a retrieval system and a retrieval method, which can be used in the fields of pattern recognition, data mining, computer vision and the like. Background technique [0002] In recent years, with the massive increase of image data, it has become a thorny problem to quickly retrieve the desired image from massive images. [0003] Existing image retrieval mainly includes two types of retrieval methods, image search by text and image search by image. The text-to-image approach is highly dependent on the speed of manually entering tags and the availability of tags. However, keyboard input is often less efficient. The image-by-image search method requires example images as the input of the query, but example images usually do not exist in practical applications. The defects of the above two retrieval methods make them unable to b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/68G06F16/58G06K9/62G06N3/04

CPCG06N3/044G06N3/045G06F18/2411G06F18/253G06F18/214

Inventor袁媛卢孝强郭毛

OwnerXI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology