Image description method based on adaptive enhanced self-attention network

An adaptive enhancement and image description technology, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as the inability to cover semantic relationships, and the difficulty of image description models to predict reliable descriptions of semantic relationships. Feature representation, high-quality image description generation effect, high-precision credible image description generation effect

Active Publication Date: 2022-06-28
UNIV OF SCI & TECH OF CHINA
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Visual relationships include geometric positions and semantic interactions, indicating the interrelationships between targets in region-level representations, but previous work only uses geometric positions to enhance the representation of visual relationships, and only shallow position information cannot cover action-complex semantics relation
Therefore, the limitation of the state-of-the-art is that it is difficult for image captioning models to generate plausible captions with accurate semantic relationship predictions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description method based on adaptive enhanced self-attention network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] A preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

[0033] figure 1 The overall network structure is shown. For a given natural scene image, the present invention obtains the final image description through the processing of three modules. First, (a) feature extraction: a semantic relation graph is constructed using a scene graph extractor, and a geometric relation graph is constructed using a pretrained object detector Faster-RCNN to detect regions of interest and bounding boxes. Second, (b) an encoder with an adaptive relation-enhanced attention mechanism: 1) Direction-sensitive semantic enhancement considers both the bidirectional association of regional features to semantic relations and semantic relations to regional features, using them to jointly represent the complete triplet group (subject-predicate-object) information; 2) geometric relationship enhancement dynamically calculates the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of image description, and discloses an image description method based on an adaptive enhanced self-attention network, which can be used for jointly modeling a relationship on a geometric level and a semantic level, and when a clear geometric or semantic relationship exists between two objects in an image, the two objects in the image can be accurately described. According to the method, the visual relationship in the given image can be adaptively enhanced, and high-precision and credible image description generation is realized.

Description

technical field [0001] The invention relates to the technical field of image description, in particular to an image description method based on an adaptive enhanced self-attention network. Background technique [0002] Image description aims to automatically generate a sentence description for a given image, which can combine vision and language well, and is an important multimodal task. The generated image description should not only identify the objects of interest in the image, but also describe the relationship between the objects. For image captioning, a key challenge is how to accurately and efficiently model the relationship between the identified objects, which is crucial for improving the quality of generation. Recently, geometric information has been widely studied to enhance features at the region level, since geometric features, i.e. relative distances and relative sizes, contain explicit positional relationships between objects. [0003] Visual relations inclu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06V20/00G06N3/04G06N3/08G06V10/82
CPCG06N3/04G06N3/08
Inventor 毛震东张勇东李经宇
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products