Image description method

An image description and image technology, applied in the field of computer vision, can solve the problems of loss of a large amount of detailed information of image visual scenes, difficulty in performing parallel optimization calculations, and high cost of model training time

Active Publication Date: 2020-08-11
EAST CHINA NORMAL UNIV
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Even though the attention mechanism is combined with the encoder-decoder architecture to extract ROI features from global features to focus on image ROIs, a large amount of detailed information in the image's visual scene is still lost during generation
Thus, the encoder-decoder model with attention mechanism faces the following two challenges: 1) When complex objects and attributes are contained in the image, the regional features extracted from the global image feature map cannot well represent the objects Semantics
2) The inherent sequential nature of RNN makes it difficult to perform parallel optimization calculations, resulting in high time costs for model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description method
  • Image description method
  • Image description method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] refer to figure 1 , the present invention carries out the image description of the multi-level Transformer of fusing fine-grained features according to the following steps:

[0041] Step 1: Find an open-source and labeled image description data set, divide the data set into training set, verification set and test set, the image description data set is MSCOCO 2014 data set, and the data set is divided into There are 113,287 pictures in the training set, 5000 pictures in the verification set, and 5000 pictures in the test set.

[0042] Step 2: Use the BERT tool to identify each word in the image description, obtain a fixed-length word vector and form a corresponding vocabulary, use the BERT tool to identify each word in the sentence description, and obtain the word Each dimension of the vector represents the word feature, and the dimension is 1024;

[0043] Step 3: See figure 2 , for the image, use the Faster-RCNN tool to extract the feature vector of the image region...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an image description method, which is characterized in that a bilinear encoder and a multi-mode decoder are adopted to improve image description with fine-grained region objectfeatures. In the encoder, bilinear pooling is used for encoding fine-grained region image features, a simple encoder of a transformer is used for encoding region-of-interest features of an image, andall the encoded features are fused with a gate structure to serve as overall encoding features of the image. In the decoder, multi-modal features are extracted from the fine-grained region image features and category features and fused with the overall encoding features, and semantic information is decoded to generate a description. Compared with the prior art, the image description method provides a new solution for image description and application of the image description, and is simple, convenient and high in efficiency.

Description

technical field [0001] The invention relates to the field of computer vision, in particular to a method for enriching image description by fusing multi-level Transformer models with fine-grained features. Background technique [0002] Image Caption generates natural language descriptions for images, and uses the generated descriptions to help applications understand the semantics expressed in visual scenes of images. For example, image description can transform image retrieval into text retrieval, which can be used to classify images and improve image retrieval results. People usually only need a quick glance to describe the details of the visual scene of an image, but automatically adding a description to an image is a comprehensive and difficult computer vision task that needs to convert the complex information contained in the image into a natural language description. Compared with ordinary computer vision tasks, image captioning requires not only recognizing objects fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/32G06K9/46G06K9/62G06N3/04G06T9/00G06F40/30G06F40/295G06F40/284
CPCG06F40/295G06F40/284G06F40/30G06T9/002G06V10/25G06V10/44G06N3/045G06F18/24G06F18/253G06F18/214
Inventor 王俊豪罗雪妮罗轶凤钱卫宁周傲英
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products