Unlock instant, AI-driven research and patent intelligence for your innovation.

Text and image-oriented cross-media retrieval method and electronic device

A cross-media and text technology, applied in the field of text- and image-oriented cross-media retrieval methods and electronic devices, can solve the problems of insufficient association relationship mining, unequal information, noise, etc., and achieve the effect of improving the effect of image-text matching.

Active Publication Date: 2020-11-27
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 1. Because the amount of information contained in different media data is not equal, the method based on common semantic space learning may lose part of the information or introduce noise
[0008] 2. The current basic method based on cross-modal feature fusion does not fully mine the association relationship between fine-grained features of image text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text and image-oriented cross-media retrieval method and electronic device
  • Text and image-oriented cross-media retrieval method and electronic device
  • Text and image-oriented cross-media retrieval method and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the object, principle, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0046] The present invention first performs symbolic representation of images and texts. Set the number of words in each text as T, and each text is expressed as S={s 1 ,...,s T}, where s t is the feature vector of the t-th word. Image I is denoted as V = {v 1 ,...,v N}, where v n is the feature vector of the nth region, and N indicates that there are N targets extracted from the image. The speech P is denoted as P={p 1 ,...,p M}, where p m Is the feature vector of the mth frame, and M means that M frames are extracted from the total speech.

[0047] The general framework of the model of the present invention includes three parts, which are text feature representation fused with speech, region feature represe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text and image-oriented cross-media retrieval method and an electronic device, and the method comprises the steps: extracting a g-dimensional MFCC feature of voice informationwith a set length, and converting the g-dimensional MFCC feature with the length of m into a one-dimensional voice feature; encoding a set text to obtain a word-level text representation, and splicing each word in the word-level text representation with the one-dimensional voice feature to obtain a voice guide text feature; and extracting a regional feature of each picture, calculating a similarity score of the regional feature and the voice guide text feature, and judging whether the picture contains set voice information and set text information or not to obtain a retrieval result. According to the invention, pause information of voice information is utilized, the performance of an image-text matching task is improved according to the voice information and the incidence relation betweenthe voice information and the image and the text, text feature representation of the fused voice information is modeled, a fine-grained feature fusion mode based on a local attention mechanism is introduced for cross-modal feature fusion, and the image-text matching effect is improved.

Description

technical field [0001] The invention relates to the technical field of computer retrieval, in particular to a text- and image-oriented cross-media retrieval method and an electronic device. Background technique [0002] Cross-media retrieval means that the user can retrieve semantically related information of other media given the query information of a media. There are research methods based on common semantic space learning for cross-media retrieval tasks, the essence of which is to align the distribution and feature representation of different modal data. Among them, the traditional classic correlation analysis (Traditional statistical correlation analysis) (HOTELLING H. Relations between two sets of variates [M] / / Breakthroughs in statistics. Springer, 1992: 162-190.) is the basis of this type of method. Canonical correlation analysis (CCA) (AKAHO S.Akernel method for canonical correlation analysis[J].arXiv:Learning,2006.) is the most classic method, because in cross-m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/432G06N3/04G06N3/08
CPCG06F16/433G06F16/434G06N3/08G06N3/047G06N3/048G06N3/045Y02D10/00
Inventor 于静郭晶晶胡玥谭建龙郭莉
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI