Image text description method based on knowledge transfer multi-modal recurrent neural network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A recurrent neural network, multi-modal technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of limited data set, high data set cost, irrelevant information, etc., to achieve appropriate semantics and accurate grammar. Structured, readable effect

Active Publication Date: 2017-05-10

SYSU CMU SHUNDE INT JOINT RES INST +1

View PDF5 Cites 39 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] However, since this method can only be applied to the existing image and text description matching data sets, for some new objects in some images that do not appear in the text description data words, this method cannot be identified, resulting in the generation of The information described by the sentence may not be related to the information presented by the image

Moreover, due to the limited pairwise matching data sets of images and text descriptions, it is impossible to cover most of the objects in the images, and when making such data sets, the image information is required to match the text information, which requires manual production, so making such data higher integration cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] Such as figure 1 As shown, an image text description method based on knowledge transfer multimodal recurrent neural network, including the following steps:

[0049] S1: Train an image semantic classifier in the server;

[0050] S2: Train the language model in the server;

[0051] S3: Pre-train the text description generation model in the server and generate description sentences.

[0052] The specific process of step S1 is as follows:

[0053] S11: Collect multiple image datasets: download ready-made datasets, including ImageNet and MSCOCO, since MSCOCO is a pairwise matching dataset of images and text descriptions, take the image part;

[0054] S12: Use the convolutional neural network to extract the corresponding image feature f for each picture in the collected data set I ;

[0055]S13: Make a label set, select 1000 most common words that cover 90% of the words used in the training set of image and text description pair matching, and add objects that do not appe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an image text description method based on a knowledge transfer multi-modal recurrent neural network. According to the method, by a knowledge transfer model in a multi-modal unit, identification capacity of an existing image classifier on most of objects and grammatical structures and semantic association in an existing corpus are well utilized, a target object in an image can be more accurately described, and generated sentence descriptions can be richer in grammatical structure, appropriate in semantics, and higher in readability.

Description

technical field [0001] The present invention relates to the fields of machine vision and pattern recognition, and more specifically, relates to an image-text description method based on a knowledge transfer-based multimodal recurrent neural network. Background technique [0002] In recent years, the rapid development of natural language processing of recurrent neural network and image classification processing based on convolutional neural network has made image understanding technology using deep neural network widely adopted. As a technology that connects two major artificial intelligence fields (computer vision and natural language processing), automatic generation of image text description has attracted more and more people's attention and research. [0003] For ordinary image text description generation, good results have been achieved so far. For example, in 2015, Junhua Mao et al. proposed an image description model based on multimodal recurrent neural network (m-RNN...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/46G06K9/62

CPCG06V10/424G06F18/24G06F18/214

Inventor胡海峰张俊轩王腾杨梁王伟轩

OwnerSYSU CMU SHUNDE INT JOINT RES INST

Image text description method based on knowledge transfer multi-modal recurrent neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology