Image and voice cross-modal retrieval classifier model, retrieval system and retrieval method
A classifier, cross-modal technology, applied in still image data retrieval, metadata still image retrieval, audio data retrieval and other directions, can solve the loss of useful or important detailed information, low retrieval efficiency, and can not well meet user retrieval. needs, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0111] refer to figure 1 , the steps of the realization of the cross-modal retrieval method of image and voice of the present invention are as follows:
[0112] Step 1, build an image-voice database, each image in the database corresponds to a voice;
[0113] Step 2, the image-voice database that step 1) builds is divided into image-voice training set and image-voice test set;
[0114] Step 3, constructing the image-speech neural network. The image-speech neural network includes an image deep neural subnetwork and a speech convolutional neural subnetwork;
[0115] The network structure of the image deep neural subnetwork is shown in Table 1.
[0116] The speech convolutional neural subnetwork is a one-dimensional structure, including a convolutional layer (Conv) and a pooling layer (Pool), where the size of the convolution kernel does not exceed 10, because the use of a larger convolution kernel will greatly increase the complexity The pooling layer adopts the maximum pooli...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com