Image description generation method based on depth LSTM network

An image description and network technology, applied in the field of image understanding, can solve problems such as insufficient levels of multi-modal information transformation, weak semantic information of sentences, and difficulty in improving overall performance, so as to improve semantic expression ability, prevent over-fitting phenomenon, and accurately sex high effect

Active Publication Date: 2017-05-10
TONGJI UNIV
View PDF4 Cites 62 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods rely too much on the previous visual technology, the processing process is complicated, and the language model for generating sentences at the back end of the system is insufficiently optimized; when LSTM units are used to generate sentences, the model depth is relatively shallow (1-layer or 2-layer LSTM is often used), The level of multimodal information transformation is not enough, the semantic information of the generated sentences is not strong, and the overall performance is difficult to improve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description generation method based on depth LSTM network
  • Image description generation method based on depth LSTM network
  • Image description generation method based on depth LSTM network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0057] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0058] A method of image description generation based on deep LSTM network, such as image 3 , Figure 4 with Figure 5 shown, including steps:

[0059] 1) Make a training set, a verification set and a test set, and use the GoogLeNet model to extract the CNN features of the image; the specific process includes:

[0060] 11) training set, verification set and test set are converted into hdf5 format, each image corresponds to a plurality of tags, and each tag is a word in the reference sentence corresponding to the image;

[0061] 12) Read the image, scale it to a size of 256×256, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an image description generation method based on a depth LSTM network, comprising the following steps: (1) extracting the CNN characteristics of an image in an image description dataset, and acquiring an embedded vector corresponding to the image and describing the words in a reference sentence; (2) building a double-layer LSTM network, and carrying out series modeling based on the double-layer LSTM network and a CNN network to generate a multimodal LSTM model; (3) training the multimodal LSTM model by means of joint training; (4) gradually increasing the number of layers of the LSTM network in the multimodal LSTM model, carrying out training each time one layer is added to the LSTM network, and finally, getting a gradual multi-objective optimization and multilayer probability fused image description model; and (5) fusing the probability scores output by the branches of the multilayer LSTM network in the gradual multi-objective optimization and multilayer probability fused image description model, and outputting the word corresponding to the maximum probability through common decision. Compared with the prior art, the method has such advantages as multiple layers, improved expression ability, effective updating, and high accuracy.

Description

technical field [0001] The invention relates to the field of image understanding, in particular to an image description generation method based on a deep LSTM network. Background technique [0002] Image caption generation is a very challenging task, and it has broad application prospects in the fields of early childhood education, visually impaired assistance, and human-computer interaction. It combines the two fields of natural language processing and computer vision to describe a natural image in the form of natural language, or translate the image into natural language. It first requires the system to be able to accurately understand the content in the image, such as identifying the scene in the image, various objects, object attributes, ongoing actions, and the relationship between objects, etc.; then according to grammatical rules and language structure, generate human understandable sentences. [0003] A variety of methods have been proposed to solve this problem, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04
CPCG06N3/045G06F18/251G06F18/214Y02T10/40
Inventor 王瀚漓汤鹏杰
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products