Cross-modal image text retrieval method of hybrid fusion model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A fusion model and image fusion technology, applied in the field of cross-modal retrieval, can solve problems such as roughness, insufficient cross-modal learning, and general performance, so as to improve accuracy, promote cross-modal information exchange, and enhance expression ability Effect

Active Publication Date: 2021-05-11

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF2 Cites 24 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] At present, the mainstream cross-modal retrieval method adopts the late fusion strategy, and the image and text data are respectively embedded and coded by using a more complex network design. Such methods often have the problem of insufficient cross-modal learning, and at the same time, the calculation cost is high.

On the other hand, the existing early fusion methods are often relatively rough, and can only fuse image and text data at the global level, and their performance is relatively average.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0054] figure 1 It is a flowchart of a cross-modal image text retrieval method of a hybrid fusion model of the present invention.

[0055] In this example, if figure 1 As shown, a cross-modal image text retrieval method of a hybrid fusion model of the present invention comprises the following steps:

[0056] S1. Extracting cross-modal data features;

[0057] S1.1. Download cross-modal image-text pair data including N groups of images and their corresponding descriptive texts;

[0058] S1.2. In each set of cross-modal image-text pair data, use the region-based convolutional neural network FasterR-CNN to extract the image region feature set V={v i}, where v i Represent the i-th image region feature, i=1,2,...,k, k represents the number of elements in the image region feature set, and k is taken as 36 in the present embodiment; Utilize the text word feature based on the gated recurrent unit GRU Set T = {t j}, where t j Represent the jth text word feature, j=1,2,...,l, l is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cross-modal image text retrieval method of a hybrid fusion model. The method comprises the following steps of: in an early fusion structure, firstly combining local visual area features and original global features of a text to obtain a unified cross-modal fusion representation, and then taking the fusion features as input, enhancing interaction between the local visual features and the language information in a subsequent embedded network; meanwhile, on the basis of a traditional late fusion structure, inputting an original image and sentence features into a vision encoder and a text encoder respectively for intra-modal feature enhancement, and enriching semantic information of respective modals; and finally, the whole network similarity is a weighted linear combination of early fusion similarity and late fusion similarity so that complementation of early fusion in a cross-modal learning layer and late fusion in an intra-modal learning layer is realized, and potential alignment between image and text modals is completed.

Description

technical field [0001] The invention belongs to the technical field of cross-modal retrieval, and more specifically relates to a cross-modal image text retrieval method of a hybrid fusion model. Background technique [0002] Cross-modal retrieval means that users retrieve semantically relevant data in all modalities by inputting query data in any modality. With the increasing amount of multi-modal data such as text, images, and videos in the mobile Internet, retrieval across different modalities has become a new trend in information retrieval. Realizing fast and accurate image-text retrieval has great application value and economic benefits. [0003] Since computer vision features from image data and language features from text data naturally have a "heterogeneous gap" in data distribution and underlying feature representation, how to measure the high-level semantic correlation between images and text is still a challenge. The solution of the current method is usually to fu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/583G06F40/194G06F40/30G06K9/62G06N3/04G06N3/08G06N5/04

CPCG06F16/5846G06F40/194G06F40/30G06N3/04G06N3/08G06N5/04G06F18/22G06F18/214G06F18/253

Inventor 徐行王依凡杨阳邵杰申恒涛

Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-modal image text retrieval method of hybrid fusion model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology