Check patentability & draft patents in minutes with Patsnap Eureka AI!

Image text matching method based on multi-relation perceptual reasoning

A matching method and text technology, applied in reasoning methods, still image data retrieval, metadata still image retrieval, etc., can solve problems such as polysemous words cannot be solved

Pending Publication Date: 2022-03-01
SICHUAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the richness of sentence semantics and the diversity of structures, these methods use a fixed vector for each word, which cannot solve the problem of polysemy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image text matching method based on multi-relation perceptual reasoning
  • Image text matching method based on multi-relation perceptual reasoning
  • Image text matching method based on multi-relation perceptual reasoning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with accompanying drawing:

[0027] figure 1 It is a schematic diagram of the multi-relationship perception reasoning module proposed by the present invention. This module consists of spatial relational reasoning and semantic relational reasoning, and is used to capture the spatial positional relationship between image regions and the semantic relationship between objects. These visual relational features can characterize finer-grained content in images, which in turn provide a complete scene interpretation, thereby facilitating matching with complex textual semantic representations. In order to verify the rationality of the multi-relational perception reasoning module proposed by the present invention, the single-relational reasoning and multi-relational reasoning were tested and verified, and the results are shown in Table 1:

[0028] Table I

[0029]

[0030] figure 2 It is a structural diag...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Aiming at an image text matching task, the invention designs an image text matching method based on multi-relation perceptual reasoning, and relates to the two fields of computer vision and natural language processing. Sufficient mining of features of visual and text modals and alignment of features of different modals are the key to overcome the difficulty of an image text matching task. Reasonable utilization of the graph convolutional neural network is beneficial for improving the performance of the model, the method designs a multi-relation perception reasoning module for the image based on the graph convolutional neural network, pays attention to the semantic relation and spatial position relation of the image, extracts richer visual feature representation, and realizes better alignment with text semantic information; and a BERT-GRU-based text encoder is combined, so that deep semantic information of the sentences can be comprehensively expressed, and good alignment with image visual representation is realized. The method has certain significance in the practical application aspects of automatic image-text mutual retrieval, children intelligence development education, visual impaired person assistance and the like.

Description

technical field [0001] The present invention relates to two fields of computer vision and natural language processing, and specifically involves using a multi-relationship perception reasoning module to focus on the spatial position relationship and semantic relationship between image regions, and using a BERT-based text encoder to focus on text representations containing contextual semantic information . Background technique [0002] The image-text matching task (Image-text matching) aims to measure the similarity between an image and a piece of text in a cross-modal embedding space. This task involves the learning of two modalities, vision and text, and is a bridge for joint computer vision and natural language processing. [0003] Early image-text matching models mainly used a standard dual-branch embedding architecture to extract image and text features and map them into the embedding space for matching. This method has been proven to be useful, but only focuses on the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/535G06F16/58G06F16/583G06F16/587G06N5/04
CPCG06F16/535G06F16/5846G06F16/5866G06F16/587G06N5/04Y02D10/00
Inventor 何小海张津刘露平卿粼波罗晓东陈洪刚吴小强
Owner SICHUAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More