Text and image-oriented cross-media retrieval method and electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cross-media and text technology, applied in the field of text- and image-oriented cross-media retrieval methods and electronic devices, can solve the problems of insufficient association relationship mining, unequal information, noise, etc., and achieve the effect of improving the effect of image-text matching.

Active Publication Date: 2020-11-27

INST OF INFORMATION ENG CHINESE ACAD OF SCI

View PDF3 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] 1. Because the amount of information contained in different media data is not equal, the method based on common semantic space learning may lose part of the information or introduce noise

[0008] 2. The current basic method based on cross-modal feature fusion does not fully mine the association relationship between fine-grained features of image text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0045] In order to make the object, principle, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0046] The present invention first performs symbolic representation of images and texts. Set the number of words in each text as T, and each text is expressed as S={s 1 ,...,s T}, where s t is the feature vector of the t-th word. Image I is denoted as V = {v 1 ,...,v N}, where v n is the feature vector of the nth region, and N indicates that there are N targets extracted from the image. The speech P is denoted as P={p 1 ,...,p M}, where p m Is the feature vector of the mth frame, and M means that M frames are extracted from the total speech.

[0047] The general framework of the model of the present invention includes three parts, which are text feature representation fused with speech, region feature represe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a text and image-oriented cross-media retrieval method and an electronic device, and the method comprises the steps: extracting a g-dimensional MFCC feature of voice informationwith a set length, and converting the g-dimensional MFCC feature with the length of m into a one-dimensional voice feature; encoding a set text to obtain a word-level text representation, and splicing each word in the word-level text representation with the one-dimensional voice feature to obtain a voice guide text feature; and extracting a regional feature of each picture, calculating a similarity score of the regional feature and the voice guide text feature, and judging whether the picture contains set voice information and set text information or not to obtain a retrieval result. According to the invention, pause information of voice information is utilized, the performance of an image-text matching task is improved according to the voice information and the incidence relation betweenthe voice information and the image and the text, text feature representation of the fused voice information is modeled, a fine-grained feature fusion mode based on a local attention mechanism is introduced for cross-modal feature fusion, and the image-text matching effect is improved.

Description

technical field [0001] The invention relates to the technical field of computer retrieval, in particular to a text- and image-oriented cross-media retrieval method and an electronic device. Background technique [0002] Cross-media retrieval means that the user can retrieve semantically related information of other media given the query information of a media. There are research methods based on common semantic space learning for cross-media retrieval tasks, the essence of which is to align the distribution and feature representation of different modal data. Among them, the traditional classic correlation analysis (Traditional statistical correlation analysis) (HOTELLING H. Relations between two sets of variates [M] / / Breakthroughs in statistics. Springer, 1992: 162-190.) is the basis of this type of method. Canonical correlation analysis (CCA) (AKAHO S.Akernel method for canonical correlation analysis[J].arXiv:Learning,2006.) is the most classic method, because in cross-m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/432G06N3/04G06N3/08

CPCG06F16/433G06F16/434G06N3/08G06N3/047G06N3/048G06N3/045Y02D10/00

Inventor 于静郭晶晶胡玥谭建龙郭莉

Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI

Text and image-oriented cross-media retrieval method and electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology