Fine-grained image-text retrieval method and system based on Transform model

A fine-grained, model-based technology, applied in unstructured text data retrieval, digital data information retrieval, biological neural network models, etc., can solve problems such as unsatisfactory retrieval accuracy, achieve excellent retrieval results, improve performance and The effect of high precision and retrieval accuracy

Pending Publication Date: 2022-07-22
浙大宁波理工学院
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current retrieval method based on deep learning can effectively rea

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fine-grained image-text retrieval method and system based on Transform model
  • Fine-grained image-text retrieval method and system based on Transform model
  • Fine-grained image-text retrieval method and system based on Transform model

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0075] Example

[0076] like figure 1 As shown, a fine-grained image and text retrieval method based on Transformer model includes the following steps:

[0077] S01: Obtain the area vector group of the target area of ​​the image and the word vector group of the text;

[0078] S02: Use the trained self-attention Transformer model to optimize each target area of ​​the image, so that each target area can obtain effective information of other surrounding target areas; use the trained self-attention Transformer model to determine according to the semantic information of the text Meaning information of the current target word;

[0079] S03: Use the trained mutual attention Transformer model to process cross-modal information, and interact with the information between the image and the text, so that the regional vector group includes key information, and the word vector group includes detailed information;

[0080] S04: Calculate the similarity between each region vector in the fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fine-grained image-text retrieval method based on a Transform model. The fine-grained image-text retrieval method comprises the following steps: acquiring a region vector group of an image target region and a word vector group of a text; each target area of the image is optimized by using a trained self-attention Transform model, so that each target area obtains effective information of other target areas around the target area; determining the meaning information of the current target word according to the semantic information of the text by using a trained self-attention Transform model; performing cross-modal information processing by using a trained mutual attention Transform model, so that the region vector group comprises key information, and the word vector group comprises detail information; and respectively calculating the similarity of each finally obtained region vector and each word vector, obtaining a fine-grained semantic similarity matrix of the input image and the input text, and obtaining a retrieval result. And the retrieval performance and accuracy are improved.

Description

technical field [0001] The invention belongs to the technical field of image and text retrieval, and in particular relates to a fine-grained image and text retrieval method and system based on a Transformer model. Background technique [0002] With the development of Internet technology, various applications and web pages generate a large number of pictures and texts every day, and these pictures and texts may have certain connections. [0003] In practical applications, images corresponding to text can be retrieved based on cross-modal retrieval algorithms. In the related art, the cross-modal retrieval algorithm mainly extracts the image features of all the pictures in the picture library and the text features of the text, and determines the similarity between each picture and text according to the image features and text features, and then determines the similarity between the pictures and the text from the picture library. The image with the highest text similarity. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/532G06F16/583G06F16/33G06N3/04G06V10/25G06V10/82
CPCG06F16/532G06F16/583G06F16/3334G06F16/3344G06N3/047G06N3/048G06N3/044G06N3/045
Inventor 张百灵潘正新武芳宇
Owner 浙大宁波理工学院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products