RNN-based automatic picture description generation method

A technology of automatic generation and picture description, applied to instruments, character and pattern recognition, computer components, etc., can solve the problems of weak semantic expression and poor utilization, and achieve the effect of improving performance and reducing complexity

Inactive Publication Date: 2016-06-01
SOUTH CHINA UNIV OF TECH
View PDF2 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it processes the entire image as a whole, and cannot make good use of the spatial position information in the image.
Show, Attend and Tell: NeuralImageCaptionGenerationwithVisualAtten

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • RNN-based automatic picture description generation method
  • RNN-based automatic picture description generation method
  • RNN-based automatic picture description generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] A method for automatically generating image description based on RNN in this embodiment, such as figure 1 As shown, including the following steps:

[0035] S1 carries on the training process on the computer:

[0036] S1.1 Collecting data set: Download the mscoco database from http: / / mscoco.org / . The database contains 300,000 pictures, each with 5 sentences describing the content of the image;

[0037] S1.2 Use a deep learning network (refer to the paper ImageNetClassificationwithDeepConvolutionalNeuralNetworks, AlexKrizhevsky, IlyaSutskever, GeoffreyEHinton, NIPS2012.) to extract image features from each picture in the training set; this embodiment selects the output of the last fully connected layer of the network structure m= 4096-dimensional vector F i ∈R 4096 As the feature vector of the image;

[0038] S1.3 Part-of-speech screening: collect the vocabulary list of English words in Level 4 and 6 and the part of speech of each word;

[0039] Perform part-of-speech screening f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an RNN-based automatic picture description generation method. A deep web which is well trained in advance is firstly used for image feature extraction; non-noun and non-verb components are removed for words in the sentence; an LSTM network is finally used for joint training on the image features and lexical features; during the sentence generation process, a sentence formed by nouns and verbs is generated through the inputted image and the well-trained LSTM network; and then, through large corpus on the network, the final outputted sentence is generated. Automatic recognition can be realized, a digital image uploaded by the user is understood, and a natural sentence understood by a human being is generated.

Description

Technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method for automatically generating image descriptions based on RNN. Background technique [0002] Automatic image content description is a new technology that has emerged in artificial intelligence image recognition in recent years. Its purpose is to automatically express the content information of an image in natural language through algorithms. ShowandTell: ANeuralImageCaptionGenerator, OriolVinyals2014, by first extracting image features using a deep network, and then using the RNN model to convert the image features into text descriptions to extract the semantic information of the image. However, it is to process the entire image as a whole, and cannot make good use of the spatial position information in the image. Show, Attend and Tell: NeuralImageCaptionGenerationwithVisualAttention, KelvinXu2015, adds saliency detection on top of it, which makes use of the spatial in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/214
Inventor 郭礼华廖启俊
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products