Image description generation method based on depth LSTM network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and network technology, applied in the field of image understanding, can solve problems such as insufficient levels of multi-modal information transformation, weak semantic information of sentences, and difficulty in improving overall performance, so as to improve semantic expression ability, prevent over-fitting phenomenon, and accurately sex high effect

Active Publication Date: 2017-05-10

TONGJI UNIV

View PDF4 Cites 62 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, these methods rely too much on the previous visual technology, the processing process is complicated, and the language model for generating sentences at the back end of the system is insufficiently optimized; when LSTM units are used to generate sentences, the model depth is relatively shallow (1-layer or 2-layer LSTM is often used), The level of multimodal information transformation is not enough, the semantic information of the generated sentences is not strong, and the overall performance is difficult to improve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0057] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0058] A method of image description generation based on deep LSTM network, such as image 3 , Figure 4 with Figure 5 shown, including steps:

[0059] 1) Make a training set, a verification set and a test set, and use the GoogLeNet model to extract the CNN features of the image; the specific process includes:

[0060] 11) training set, verification set and test set are converted into hdf5 format, each image corresponds to a plurality of tags, and each tag is a word in the reference sentence corresponding to the image;

[0061] 12) Read the image, scale it to a size of 256×256, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an image description generation method based on a depth LSTM network, comprising the following steps: (1) extracting the CNN characteristics of an image in an image description dataset, and acquiring an embedded vector corresponding to the image and describing the words in a reference sentence; (2) building a double-layer LSTM network, and carrying out series modeling based on the double-layer LSTM network and a CNN network to generate a multimodal LSTM model; (3) training the multimodal LSTM model by means of joint training; (4) gradually increasing the number of layers of the LSTM network in the multimodal LSTM model, carrying out training each time one layer is added to the LSTM network, and finally, getting a gradual multi-objective optimization and multilayer probability fused image description model; and (5) fusing the probability scores output by the branches of the multilayer LSTM network in the gradual multi-objective optimization and multilayer probability fused image description model, and outputting the word corresponding to the maximum probability through common decision. Compared with the prior art, the method has such advantages as multiple layers, improved expression ability, effective updating, and high accuracy.

Description

technical field [0001] The invention relates to the field of image understanding, in particular to an image description generation method based on a deep LSTM network. Background technique [0002] Image caption generation is a very challenging task, and it has broad application prospects in the fields of early childhood education, visually impaired assistance, and human-computer interaction. It combines the two fields of natural language processing and computer vision to describe a natural image in the form of natural language, or translate the image into natural language. It first requires the system to be able to accurately understand the content in the image, such as identifying the scene in the image, various objects, object attributes, ongoing actions, and the relationship between objects, etc.; then according to grammatical rules and language structure, generate human understandable sentences. [0003] A variety of methods have been proposed to solve this problem, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06N3/04

CPCG06N3/045G06F18/251G06F18/214Y02T10/40

Inventor 王瀚漓汤鹏杰

Owner TONGJI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image description generation method based on depth LSTM network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology