Image description method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and image technology, applied in the field of computer vision, can solve the problems of loss of a large amount of detailed information of image visual scenes, difficulty in performing parallel optimization calculations, and high cost of model training time

Active Publication Date: 2020-08-11

EAST CHINA NORMAL UNIV

View PDF4 Cites 13 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Even though the attention mechanism is combined with the encoder-decoder architecture to extract ROI features from global features to focus on image ROIs, a large amount of detailed information in the image's visual scene is still lost during generation

Thus, the encoder-decoder model with attention mechanism faces the following two challenges: 1) When complex objects and attributes are contained in the image, the regional features extracted from the global image feature map cannot well represent the objects Semantics

2) The inherent sequential nature of RNN makes it difficult to perform parallel optimization calculations, resulting in high time costs for model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] refer to figure 1 , the present invention carries out the image description of the multi-level Transformer of fusing fine-grained features according to the following steps:

[0041] Step 1: Find an open-source and labeled image description data set, divide the data set into training set, verification set and test set, the image description data set is MSCOCO 2014 data set, and the data set is divided into There are 113,287 pictures in the training set, 5000 pictures in the verification set, and 5000 pictures in the test set.

[0042] Step 2: Use the BERT tool to identify each word in the image description, obtain a fixed-length word vector and form a corresponding vocabulary, use the BERT tool to identify each word in the sentence description, and obtain the word Each dimension of the vector represents the word feature, and the dimension is 1024;

[0043] Step 3: See figure 2 , for the image, use the Faster-RCNN tool to extract the feature vector of the image region...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image description method, which is characterized in that a bilinear encoder and a multi-mode decoder are adopted to improve image description with fine-grained region objectfeatures. In the encoder, bilinear pooling is used for encoding fine-grained region image features, a simple encoder of a transformer is used for encoding region-of-interest features of an image, andall the encoded features are fused with a gate structure to serve as overall encoding features of the image. In the decoder, multi-modal features are extracted from the fine-grained region image features and category features and fused with the overall encoding features, and semantic information is decoded to generate a description. Compared with the prior art, the image description method provides a new solution for image description and application of the image description, and is simple, convenient and high in efficiency.

Description

technical field [0001] The invention relates to the field of computer vision, in particular to a method for enriching image description by fusing multi-level Transformer models with fine-grained features. Background technique [0002] Image Caption generates natural language descriptions for images, and uses the generated descriptions to help applications understand the semantics expressed in visual scenes of images. For example, image description can transform image retrieval into text retrieval, which can be used to classify images and improve image retrieval results. People usually only need a quick glance to describe the details of the visual scene of an image, but automatically adding a description to an image is a comprehensive and difficult computer vision task that needs to convert the complex information contained in the image into a natural language description. Compared with ordinary computer vision tasks, image captioning requires not only recognizing objects fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/32G06K9/46G06K9/62G06N3/04G06T9/00G06F40/30G06F40/295G06F40/284

CPCG06F40/295G06F40/284G06F40/30G06T9/002G06V10/25G06V10/44G06N3/045G06F18/24G06F18/253G06F18/214

Inventor 王俊豪罗雪妮罗轶凤钱卫宁周傲英

Owner EAST CHINA NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image description method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology