Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Attention mechanism-based image description generation method

An image description and attention technology, applied in computer parts, biological neural network models, instruments, etc., can solve problems such as error accumulation, difficulty in focusing on target objects, and loss of information.

Pending Publication Date: 2020-01-10
WUHAN UNIV
View PDF6 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the model usually uses global or object-level image features. Using such features is difficult to focus on the salient target objects in the image and will lose a lot of important information in the image. It is difficult to fully apply the important visual semantic relationship information in the image. into the model
And the existing model is mostly a one-step forward process. When the model generates the next word, it can only use the words that have been generated before, so if a wrong word is generated during the generation process, it will cause an error later. cumulative
On the other hand, the existing model maximizes the joint probability of the sequence generated by the model during training, so that the cross-entropy loss is minimized to train the model, and the joint probability of the generated reference words is maximized through back propagation, so that the model can learn What we get is the probability distribution of the words in the sentence, which is different from the automatic evaluation indicators usually used when judging the quality of the sentences generated by the image description model. These evaluation indicators are not differentiable and cannot be used directly as a loss function. This loss function is the same as The inconsistency of evaluation indicators prevents the model from being fully optimized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Attention mechanism-based image description generation method
  • Attention mechanism-based image description generation method
  • Attention mechanism-based image description generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0078] Such as figure 1 As shown, an image description generation method based on attention mechanism includes the following steps:

[0079] Step 1, extracting words from the tagged sentences of the dataset to build a vocabulary;

[0080] The way to obtain the vocabulary in step 1 is to count the number of occurrences of each word in the text description of the MS COCO dataset, and only select words that appear more than five times to be included in the vocabulary. The vocabulary of the MS COCO dataset contains 9,487 words .

[0081] Step 2: Use the ResNet101 model as the initial CNN model, use the ImageNet dataset to pre-train the parameters of ResNet101, use the pre-trained ResNet101 to extract the global features of the image alone, and then use the pre-trained ResNet101 to replace the CNN extraction in the Faster R-CNN a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an attention mechanism-based image description generation method. Firstly, more accurate image description is generated by using an attention mechanism according to significantobject information in an image and information of a relationship between objects, then, finer-grained image description is generated by using a double-layer language generation model, and finally, thewhole model is optimized by using reinforcement learning. The method has the advantages that image information can be enriched through fusion of the relation features and the object features, the double-layer language model can generate image description with finer granularity, and the problem of exposure deviation can be relieved by further optimizing the training model through reinforcement learning.

Description

technical field [0001] The invention belongs to the fields of computer vision and natural language processing, and relates to an image language description generation method, in particular to an attention mechanism-based image description generation method. Background technique [0002] In many situations in life, it is necessary to convert image content into text descriptions, such as automatically generating text summaries of images in social software when the network status is poor, and helping visually impaired people understand image content. Existing image description methods are mainly based on deep learning methods, which use convolutional neural networks as image processing models to extract image features, and input image features into recurrent neural networks as language generation models to generate image description languages. However, the model usually uses global or object-level image features. Using such features is difficult to focus on the salient target o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04
CPCG06N3/044G06N3/045G06F18/214G06F18/253
Inventor 肖春霞赵坤
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products