Attention mechanism-based image description generation method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and attention technology, applied in computer parts, biological neural network models, instruments, etc., can solve problems such as error accumulation, difficulty in focusing on target objects, and loss of information.

Pending Publication Date: 2020-01-10

WUHAN UNIV

View PDF6 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the model usually uses global or object-level image features. Using such features is difficult to focus on the salient target objects in the image and will lose a lot of important information in the image. It is difficult to fully apply the important visual semantic relationship information in the image. into the model

And the existing model is mostly a one-step forward process. When the model generates the next word, it can only use the words that have been generated before, so if a wrong word is generated during the generation process, it will cause an error later. cumulative

On the other hand, the existing model maximizes the joint probability of the sequence generated by the model during training, so that the cross-entropy loss is minimized to train the model, and the joint probability of the generated reference words is maximized through back propagation, so that the model can learn What we get is the probability distribution of the words in the sentence, which is different from the automatic evaluation indicators usually used when judging the quality of the sentences generated by the image description model. These evaluation indicators are not differentiable and cannot be used directly as a loss function. This loss function is the same as The inconsistency of evaluation indicators prevents the model from being fully optimized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0077] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0078] Such as figure 1 As shown, an image description generation method based on attention mechanism includes the following steps:

[0079] Step 1, extracting words from the tagged sentences of the dataset to build a vocabulary;

[0080] The way to obtain the vocabulary in step 1 is to count the number of occurrences of each word in the text description of the MS COCO dataset, and only select words that appear more than five times to be included in the vocabulary. The vocabulary of the MS COCO dataset contains 9,487 words .

[0081] Step 2: Use the ResNet101 model as the initial CNN model, use the ImageNet dataset to pre-train the parameters of ResNet101, use the pre-trained ResNet101 to extract the global features of the image alone, and then use the pre-trained ResNet101 to replace the CNN extraction in the Faster R-CNN a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an attention mechanism-based image description generation method. Firstly, more accurate image description is generated by using an attention mechanism according to significantobject information in an image and information of a relationship between objects, then, finer-grained image description is generated by using a double-layer language generation model, and finally, thewhole model is optimized by using reinforcement learning. The method has the advantages that image information can be enriched through fusion of the relation features and the object features, the double-layer language model can generate image description with finer granularity, and the problem of exposure deviation can be relieved by further optimizing the training model through reinforcement learning.

Description

technical field [0001] The invention belongs to the fields of computer vision and natural language processing, and relates to an image language description generation method, in particular to an attention mechanism-based image description generation method. Background technique [0002] In many situations in life, it is necessary to convert image content into text descriptions, such as automatically generating text summaries of images in social software when the network status is poor, and helping visually impaired people understand image content. Existing image description methods are mainly based on deep learning methods, which use convolutional neural networks as image processing models to extract image features, and input image features into recurrent neural networks as language generation models to generate image description languages. However, the model usually uses global or object-level image features. Using such features is difficult to focus on the salient target o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06N3/04

CPCG06N3/044G06N3/045G06F18/214G06F18/253

Inventor 肖春霞赵坤

Owner WUHAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Attention mechanism-based image description generation method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology