Image text description method based on knowledge transfer multi-modal recurrent neural network

A recurrent neural network, multi-modal technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of limited data set, high data set cost, irrelevant information, etc., to achieve appropriate semantics and accurate grammar. Structured, readable effect

Active Publication Date: 2017-05-10
SYSU CMU SHUNDE INT JOINT RES INST +1
View PDF5 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, since this method can only be applied to the existing image and text description matching data sets, for some new objects in some images that do not appear in the text description data words, this method cannot be identified, resulting in the generation of The information described by the sentence may not be related to the informati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image text description method based on knowledge transfer multi-modal recurrent neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Such as figure 1 As shown, an image text description method based on knowledge transfer multimodal recurrent neural network, including the following steps:

[0049] S1: Train an image semantic classifier in the server;

[0050] S2: Train the language model in the server;

[0051] S3: Pre-train the text description generation model in the server and generate description sentences.

[0052] The specific process of step S1 is as follows:

[0053] S11: Collect multiple image datasets: download ready-made datasets, including ImageNet and MSCOCO, since MSCOCO is a pairwise matching dataset of images and text descriptions, take the image part;

[0054] S12: Use the convolutional neural network to extract the corresponding image feature f for each picture in the collected data set I ;

[0055]S13: Make a label set, select 1000 most common words that cover 90% of the words used in the training set of image and text description pair matching, and add objects that do not appe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an image text description method based on a knowledge transfer multi-modal recurrent neural network. According to the method, by a knowledge transfer model in a multi-modal unit, identification capacity of an existing image classifier on most of objects and grammatical structures and semantic association in an existing corpus are well utilized, a target object in an image can be more accurately described, and generated sentence descriptions can be richer in grammatical structure, appropriate in semantics, and higher in readability.

Description

technical field [0001] The present invention relates to the fields of machine vision and pattern recognition, and more specifically, relates to an image-text description method based on a knowledge transfer-based multimodal recurrent neural network. Background technique [0002] In recent years, the rapid development of natural language processing of recurrent neural network and image classification processing based on convolutional neural network has made image understanding technology using deep neural network widely adopted. As a technology that connects two major artificial intelligence fields (computer vision and natural language processing), automatic generation of image text description has attracted more and more people's attention and research. [0003] For ordinary image text description generation, good results have been achieved so far. For example, in 2015, Junhua Mao et al. proposed an image description model based on multimodal recurrent neural network (m-RNN...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/46G06K9/62
CPCG06V10/424G06F18/24G06F18/214
Inventor 胡海峰张俊轩王腾杨梁王伟轩
Owner SYSU CMU SHUNDE INT JOINT RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products