RNN-based automatic picture description generation method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of automatic generation and picture description, applied to instruments, character and pattern recognition, computer components, etc., can solve the problems of weak semantic expression and poor utilization, and achieve the effect of improving performance and reducing complexity

Inactive Publication Date: 2016-06-01

SOUTH CHINA UNIV OF TECH

View PDF2 Cites 30 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, it processes the entire image as a whole, and cannot make good use of the spatial position information in the image.

Show, Attend and Tell: NeuralImageCaptionGenerationwithVisualAttention, KelvinXu2015, adding saliency detection on top of it, has a certain use of the spatial information of the image, but it only uses a simple RNN model, and the semantic expression is weak

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0034] A method for automatically generating image description based on RNN in this embodiment, such as figure 1 As shown, including the following steps:

[0035] S1 carries on the training process on the computer:

[0036] S1.1 Collecting data set: Download the mscoco database from http: / / mscoco.org / . The database contains 300,000 pictures, each with 5 sentences describing the content of the image;

[0037] S1.2 Use a deep learning network (refer to the paper ImageNetClassificationwithDeepConvolutionalNeuralNetworks, AlexKrizhevsky, IlyaSutskever, GeoffreyEHinton, NIPS2012.) to extract image features from each picture in the training set; this embodiment selects the output of the last fully connected layer of the network structure m= 4096-dimensional vector F i ∈R 4096 As the feature vector of the image;

[0038] S1.3 Part-of-speech screening: collect the vocabulary list of English words in Level 4 and 6 and the part of speech of each word;

[0039] Perform part-of-speech screening f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an RNN-based automatic picture description generation method. A deep web which is well trained in advance is firstly used for image feature extraction; non-noun and non-verb components are removed for words in the sentence; an LSTM network is finally used for joint training on the image features and lexical features; during the sentence generation process, a sentence formed by nouns and verbs is generated through the inputted image and the well-trained LSTM network; and then, through large corpus on the network, the final outputted sentence is generated. Automatic recognition can be realized, a digital image uploaded by the user is understood, and a natural sentence understood by a human being is generated.

Description

Technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method for automatically generating image descriptions based on RNN. Background technique [0002] Automatic image content description is a new technology that has emerged in artificial intelligence image recognition in recent years. Its purpose is to automatically express the content information of an image in natural language through algorithms. ShowandTell: ANeuralImageCaptionGenerator, OriolVinyals2014, by first extracting image features using a deep network, and then using the RNN model to convert the image features into text descriptions to extract the semantic information of the image. However, it is to process the entire image as a whole, and cannot make good use of the spatial position information in the image. Show, Attend and Tell: NeuralImageCaptionGenerationwithVisualAttention, KelvinXu2015, adds saliency detection on top of it, which makes use of the spatial in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62

CPCG06F18/214

Inventor 郭礼华廖启俊

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

RNN-based automatic picture description generation method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology