Image description method based on adaptive enhanced self-attention network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An adaptive enhancement and image description technology, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as the inability to cover semantic relationships, and the difficulty of image description models to predict reliable descriptions of semantic relationships. Feature representation, high-quality image description generation effect, high-precision credible image description generation effect

Active Publication Date: 2022-06-28

UNIV OF SCI & TECH OF CHINA

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Visual relationships include geometric positions and semantic interactions, indicating the interrelationships between targets in region-level representations, but previous work only uses geometric positions to enhance the representation of visual relationships, and only shallow position information cannot cover action-complex semantics relation

Therefore, the limitation of the state-of-the-art is that it is difficult for image captioning models to generate plausible captions with accurate semantic relationship predictions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0032] A preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

[0033] figure 1 The overall network structure is shown. For a given natural scene image, the present invention obtains the final image description through the processing of three modules. First, (a) feature extraction: a semantic relation graph is constructed using a scene graph extractor, and a geometric relation graph is constructed using a pretrained object detector Faster-RCNN to detect regions of interest and bounding boxes. Second, (b) an encoder with an adaptive relation-enhanced attention mechanism: 1) Direction-sensitive semantic enhancement considers both the bidirectional association of regional features to semantic relations and semantic relations to regional features, using them to jointly represent the complete triplet group (subject-predicate-object) information; 2) geometric relationship enhancement dynamically calculates the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of image description, and discloses an image description method based on an adaptive enhanced self-attention network, which can be used for jointly modeling a relationship on a geometric level and a semantic level, and when a clear geometric or semantic relationship exists between two objects in an image, the two objects in the image can be accurately described. According to the method, the visual relationship in the given image can be adaptively enhanced, and high-precision and credible image description generation is realized.

Description

technical field [0001] The invention relates to the technical field of image description, in particular to an image description method based on an adaptive enhanced self-attention network. Background technique [0002] Image description aims to automatically generate a sentence description for a given image, which can combine vision and language well, and is an important multimodal task. The generated image description should not only identify the objects of interest in the image, but also describe the relationship between the objects. For image captioning, a key challenge is how to accurately and efficiently model the relationship between the identified objects, which is crucial for improving the quality of generation. Recently, geometric information has been widely studied to enhance features at the region level, since geometric features, i.e. relative distances and relative sizes, contain explicit positional relationships between objects. [0003] Visual relations inclu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06V20/00G06N3/04G06N3/08G06V10/82

CPCG06N3/04G06N3/08

Inventor 毛震东张勇东李经宇

Owner UNIV OF SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image description method based on adaptive enhanced self-attention network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology