Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image description optimization method based on pointer network

A technology for image description and optimization methods, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as general accuracy, large model structure, and high training overhead

Pending Publication Date: 2020-12-08
NANJING UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In addition to the Attention mechanism, another major improvement method for Image Caption is to use pre-trained attribute word detectors, scene classifiers, target detectors, etc. to extract visual elements in pictures in advance, so as to decouple the problem and let Caption generate The model only needs to focus on improving the quality of the sentence, and the task of capturing the visual information of the picture is handed over to the sub-modules. Although the target detection method has a better detection effect, it has the disadvantages of a large model structure and a large training cost. , and the detection of visual attribute words is often converted into a multi-label image recognition (Multi-label ImageClassification) task, which is relatively easy to implement, but the accuracy is average, and it is generally trained using Fully Convolutional Networks (FCN)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description optimization method based on pointer network
  • Image description optimization method based on pointer network
  • Image description optimization method based on pointer network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0114] The present invention can be used for picture record summaries in daily life. Any picture taken by the user can be generated with the help of the model trained by the present invention. Viewing and browsing in the future is also in line with people's needs for fast retrieval and classification of pictures in the era of big data.

[0115] In order to verify the effectiveness of the present invention, the present invention was trained and tested on the Microsoft COCO 2014 data set. The data set has a total of 123,287 pictures, and each picture has 4 to 5 artificially provided annotation sentences. According to the division principle of Karpathy Split, 113287 pictures are used as training pictures (train set), 5000 pictures are used for verification (val set), and 5000 pictures are used for testing (test set). The present invention utilizes a GTX 1080Ti graphics card to accelerate training, adopts an Adam learner, and sets the learning rate to 2e-4.

[0116] Some paramete...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an image description optimization method based on a pointer network, and the method comprises the steps: (1) extracting a visual feature vector of an input image through a convolutional neural network, and inputting the visual feature into a description statement generation module which consists of two layers of long-term and short-term memory networks; and (2) selecting a picture area concerned at each moment by utilizing an attention mechanism, and carrying out weighted fusion on image features based on the areas; and (3) selecting the most appropriate word from pre-detected picture attribute words by combining the selected regional features and the generation condition of the current sentence and utilizing a pointer network operation mechanism, filling the word generation position at the moment with the most appropriate word, and if the appropriate word cannot be found from alternative attribute words, determining that the model automatically judges which wordis generated at the moment based on the generation condition of the current sentence; and (4) repeating the above steps, sequentially generating each word in the sentence, and finally obtaining a description sentence closer to the picture content.

Description

technical field [0001] The invention relates to an image description optimization method based on a pointer network. Background technique [0002] Image Caption image description aims to use a machine to generate a fluent, fluent and appropriate description sentence for a picture. This field has attracted a large number of researchers in recent years. Because it involves both computer vision and natural language processing, there is a lot of room for optimization and improvement, and various approaches are also meaningful and representative. Among them, the improvement based on the attention mechanism has played an important role in improving the performance of Image Caption. Its starting point is to imitate the process of human observation and understanding of the content of a picture. The original Soft / Hard Attention algorithm calculates the attention weights assigned to different regions of the picture at each moment, realizing a dynamic shift of attention. Since then, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F40/30G06F40/289
CPCG06F40/30G06N3/08G06F40/289G06N3/044G06N3/045G06F18/214
Inventor 周宇杰商琳
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products